Episode 4 DB2 pureScale Performance Webinar Oct 2010


Published on

Slides from Episode 4 of the DB2 pureScale webcast series with the IBM lab team.

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Episode 4 DB2 pureScale Performance Webinar Oct 2010

  1. 1. DB2 pureScale Performance Steve Rees srees@ca.ibm.com Oct 19, 2010
  2. 2. 2 Copyright IBM 2010 Agenda DB2 pureScale technology review RDMA and low-latency interconnect Monitoring and tuning bufferpools in pureScale Architectural features for top performance Performance metrics
  3. 3. 3 Copyright IBM 2010 Cluster Interconnect DB2 pureScale : Technology Review Single Database View Clients Database Log Log Log Log Shared Storage Access CS CS CSCS CS CS CS Member Member Member Member Primary2nd-ary DB2 engine runs on several host computers – Co-operate with each other to provide coherent access to the database from any member Data sharing architecture – Shared access to database – Members write to their own logs – Logs accessible from another host (used during recovery) Cluster Caching Facility (CF) technology from STG – Efficient global locking and buffer management – Synchronous duplexing to secondary ensures availability Low latency, high speed interconnect – Special optimizations provide significant advantages on RDMA- capable interconnects like Infiniband Clients connect anywhere,… … see single database – Clients connect into any member – Automatic load balancing and client reroute may change underlying physical member to which client is connected Integrated cluster services – Failure detection, recovery automation, cluster file system – In partnership with STG (GPFS,RSCT) and Tivoli (SA MP) Leverage IBM’s System z Sysplex Experience and Know-How
  4. 4. 4 Copyright IBM 2010 DB2 pureScale and low-latency interconnect Infiniband & uDAPL provide the low-latency RDMA infrastructure exploited by pureScale pureScale currently uses DDR and QDR IB adapters according to platform – Peak throughput of about 2-4 M messages per second – Provide message latencies in the 10s of microseconds or even lower The Infiniband development roadmap indicates continued increases in bit rates Infiniband Roadmap from www.infinibandta.org
  5. 5. 5 Copyright IBM 2010 Two-level page buffering – data consistency & improved performance The local bufferpool (LBP) caches both read-only and updated pages for that member The shared GBP contains references to every page in all LBPs across the cluster – References ensure consistency across members – who’s interested in which pages, in case the pages are updated The GBP also contains copies of all updated pages from the LBPs – Sent from the LBP at transaction commit time – Stored in the GBP & available to members on demand – 30 µs page read request over Infiniband from the GBP can be more than 100x faster than reading from disk Statistics are kept for tuning – Found in LBP vs. found in GBP vs. read from disk – Useful in tuning GBP / LBP sizes CF M1 M2 M3 10µs 3 5000µs 60µs 30 µs 2 30µs 1 Expensive disk reads from M1, M2 not required – get the modified page from the CF #1 #2 #3 #4#5
  6. 6. 6 Copyright IBM 2010 pureScale bufferpool monitoring and tuning Familiar DB2 hit ratio calculations are useful with pureScale – HR = (logical reads – physical reads) / logical reads e.g. (pool_data_l_reads – pool_data_p_reads)/pool_data_l_reads – As usual, physical reads come from disk, logical reads from the bufferpool (in pureScale, either this means either the LBP or GBP) e.g., pool_data_l_reads = pool_data_lbp_pages_found + pool_data_gbp_l_reads New metrics in pureScale support breaking this down by LBP & GBP amounts – pool_data_lbp_pages_found = logical data reads satisfied by the LBP • i.e., we needed a page, and it was present & valid in the LBP – pool_data_gbp_l_reads = logical data reads attempted at the GBP • i.e., either not present or not valid in the LBP, so we needed to go to the GBP – pool_data_gbp_p_reads = physical data reads due to page not present in either the LBP or GBP • Essentially the same as non-pureScale pool_data_p_reads – pool_data_gbp_invalid_pages = number of GBP data read attempts due to an LBP page being present but marked invalid • An indicator of the rate of GBP updates & their impact on the LBP Of course, there are index ones too
  7. 7. 7 Copyright IBM 2010 pureScale bufferpool monitoring Overall (and non-pureScale) hit ratio – (pool_data_l_reads – pool_data_p_reads)/pool_data_l_reads – Great values: 95% for index, 90% for data – Good values: 80-90% for index, 75-85% for data LBP hit ratio – (pool_data_lbp_pages_found / pool_data_l_reads) * 100% – Generally lower than the overall hit ratio, since it excludes GBP hits – Factors which may affect it, other than LBP size • Increases with greater portion of read activity in the system – Decreasing probability that LBP copies of the page have been invalidated • May decrease with cluster size – Increasing probability that another member has invalidated the LBP page GBP hit ratio – (pool_data_gbp_l_reads – pool_data_gbp_p_reads) / pool_data_gbp_l_reads – A hit here is a read of a previously modified page, so hit ratios are typically quite low • An overall (LBP+GBP) H/R in the high 90's can correspond to a GBP H/R in the low 80's – Factors which may affect it, other than GBP size • Decreases with greater portion of read activity
  8. 8. 8 Copyright IBM 2010 pureScale bufferpool tuning Step 1: typical rule-of-thumb for GBP size = 35-40% of Σ( all members’ LBP sizes ) e.g. 4 members, LBP size of 1M pages each -> GBP size of 1.4 to 1.6M pages NB - don't forget, GBP page size is always 4kB, no matter what the LBP page size is. – If your workload very read-heavy (e.g. 90% read), initial GBP allocation could be in the 20-30% range – For 2-member clusters, you may want to start with 40-50% of total LBP, vs. 35-40% Step 2: monitor the overall BP hit ratio as usual, with pool_data_l_reads, pool_data_p_reads, etc. – Meets your goals? If yes, then done! Step 3: check LBP H/R with pool_data_lbp_pages_found/pool_data_l_reads – Great values: 90% for index, 85% for data – Good values: 70-80% for index, 65-80% for data – Increasing LBP size can help increase LBP H/R – NB – for each 16 extra LBP pages, the GBP needs 1 extra page for registrations Step 4: check GBP H/R with pool_data_gbp_l_reads, pool_data_gbp_p_reads, etc. – Great values: 90% for index, 80% for data – Good values: 65-80% for index, 60-75% for data – pool_data_l_reads > 10 x pool_data_gbp_l_reads means low GBP dependence – may mean tuning GBP size in this case is less valuable – pool_data_gbp_invalid_pages > 25% of pool_data_gbp_l_reads means GBP is really helping out, and could benefit from extra pages
  9. 9. 9 Copyright IBM 2010 Page lock negotiation – or Psst! Hey buddy, can you pass me that page? – pureScale page locks are physical locks, indicating which member currently ‘owns’ the page. Picture the following: • Member A : acquires a page P and modifies a row on it, and continues with its transaction. ‘A’ holds an exclusive page lock on page P until ‘A’ commits • Member B : wants to modify a different row on the same page P. What now? – ‘B’ doesn’t have to wait until ‘A’ commits & releases the page lock • The CF will negotiate the page back from ‘A’ in the middle of ‘A’s transaction, on ‘B’s behalf • Provides far better concurrency & performance than needing to wait for a page lock until the holder commits. Log P P pureScale architectural features for optimum performance P P Member A Member B Log P ?P ! CF GLM Px : A: B
  10. 10. 10 Copyright IBM 2010 pureScale architectural features for optimum performance Table append cache and index page cache – What happens in the case of rapid inserts into a single table by multiple members? Or rapid index updates? Will it cause the insert page to ‘thrash’ back & forth between the members, each time one has a new row? – No - each member sets aside an extent for insertion into the table to eliminate contention & page thrashing. Similarly for indexes with the page cache Lock avoidance – pureScale exploits cursor stability (CS) locking semantics to avoid taking locks in many common cases – Reduces pathlength and saves trips to the CF – Transparent & always on
  11. 11. 11 Copyright IBM 2010 Notes on storage configuration for performance GPFS best practices – Automatically configured by db2cluster command • Blocksize >= 1 MB (vs. default 64k) provides noticeably improved performance • Direct (unbuffered) IO for both logs & tablespace containers • SCSI-3 P/R on AIX enables faster disk takeover on member failure – Separate paths for logs & tablespaces are recommended Dominant storage performance factor for pureScale: fast log writes – Always important in OLTP – Extra important in pureScale due to log flushes driven by page reclaims – Separate filesystems, separate devices from each other & from tablespaces – Ideally – comfortably under 1ms – Possibly even SSDs to keep write latencies as low as possible
  12. 12. 12 Copyright IBM 2010 12 Member Scalability Example Moderately heavy transaction processing workload modeling warehouse & ordering process – Write transactions rate 20% – Typical read/write ratio of many OLTP workloads No cluster awareness in the app – No affinity – No partitioning – No routing of transactions to members Configuration – Twelve 8-core p550 members, 64 GB, 5 GHz – IBM 20Gb/s IB HCAs + 7874-024 IB Switch – Duplexed PowerHA pureScale across 2 additional 8-core p550s, 64 GB, 5 GHz – DS8300 storage 576 15K disks, Two 4Gb FC Switches 1Gb Ethernet Client Connectivity 20Gb IB pureScale Interconnect 7874-024 Switch Two 4Gb FC Switches DS8300 Storage p550 members p550 Cluster Caching Facility Clients (2-way x345)
  13. 13. 13 Copyright IBM 2010 12 Member Scalability Example - Results 0 1 2 3 4 5 6 7 8 9 10 11 12 0 5 10 15 1.98x @ 2 members 3.9x @ 4 members # Members Throughputvs.1member 7.6x @ 8 members 10.4x @ 12 members
  14. 14. 14 Copyright IBM 2010 DB2 pureScale Architecture Scalability How far will it scale? Take a web commerce type workload – Read mostly but not read only – about 90/10 Don’t make the application cluster aware – No routing of transactions to members – Demonstrate transparent application scaling Scale out to the 128 member limit and measure scalability
  15. 15. 15 Copyright IBM 2010 The 128-member result 64 Members 95% Scalability 16 Members Over 95% Scalability 2, 4 and 8 Members Over 95% Scalability 32 Members Over 95% Scalability 88 Members 90% Scalability 112 Members 89% Scalability 128 Members 84% Scalability
  16. 16. 16 Copyright IBM 2010 Summary Performance & scalability are two top goals of pureScale – many architectural features were designed solely to drive the best possible performance Monitoring and tuning for pureScale extends existing DB2 interfaces and practices – e.g., techniques for optimizing GBP/LBP configuration builds on steps already familiar to DB2 DBAs. The pureScale architecture exploits leading-edge low latency interconnects and RDMA, to achieve excellent performance & scalability – Initial 12- & 128-member proofpoints are strong evidence of a successful first release, with even better things to come!
  17. 17. 17 Copyright IBM 2010 Questions