• Like


Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Episode 4 DB2 pureScale Performance Webinar Oct 2010


Slides from Episode 4 of the DB2 pureScale webcast series with the IBM lab team.

Slides from Episode 4 of the DB2 pureScale webcast series with the IBM lab team.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. DB2 pureScale Performance Steve Rees srees@ca.ibm.com Oct 19, 2010
  • 2. Agenda DB2 pureScale technology review RDMA and low-latency interconnect Monitoring and tuning bufferpools in pureScale Architectural features for top performance Performance metrics 2 Copyright IBM 2010
  • 3. DB2 pureScale : Technology Review Leverage IBM’s System z Sysplex Experience and Know-How Clients connect anywhere,… Clients … see single database – Clients connect into any member – Automatic load balancing and client reroute may change underlying physical member to which client is connected Single Database View DB2 engine runs on several host computers – Co-operate with each other to provide coherent access to the database from any member Integrated cluster services Member Member Member Member – Failure detection, recovery automation, cluster file system – In partnership with STG (GPFS,RSCT) and Tivoli (SA MP) CS CS CS CS Low latency, high speed interconnect – Special optimizations provide significant advantages on RDMA- capable interconnects like Infiniband Cluster Interconnect Cluster Caching Facility (CF) technology from STG CS CS – Efficient global locking and buffer management – Synchronous duplexing to secondary ensures availability 2nd-ary Log Log Log Log Primary Shared Storage Access Data sharing architecture – Shared access to database – Members write to their own logs Database – Logs accessible from another host (used during recovery) 3 Copyright IBM 2010
  • 4. DB2 pureScale and low-latency interconnect Infiniband & uDAPL provide the low-latency RDMA infrastructure exploited by pureScale pureScale currently uses DDR and QDR IB adapters according to platform – Peak throughput of about 2-4 M messages per second – Provide message latencies in the 10s of microseconds or even lower Infiniband Roadmap from www.infinibandta.org The Infiniband development roadmap indicates continued increases in bit rates 4 Copyright IBM 2010
  • 5. Two-level page buffering – data consistency & improved performance The local bufferpool (LBP) caches both read-only and updated pages for that member The shared GBP contains references to every page CF in all LBPs across the cluster – References ensure consistency across 3 2 1 members – who’s interested in which pages, in case the pages are updated #5 60µs The GBP also contains copies of all updated pages #4 from the LBPs #3 – Sent from the LBP at transaction commit time 30µs 30 #1 10µs – Stored in the GBP & available to members on µs demand – 30 µs page read request over Infiniband from the GBP can be more than 100x faster than reading from disk Statistics are kept for tuning – Found in LBP vs. found in GBP vs. read M1 M2 M3 from disk – Useful in tuning GBP / LBP sizes Expensive disk #2 reads from M1, M2 not required – get the modified 5000µs page from the CF 5 Copyright IBM 2010
  • 6. pureScale bufferpool monitoring and tuning Of course, there are index ones too Familiar DB2 hit ratio calculations are useful with pureScale – HR = (logical reads – physical reads) / logical reads e.g. (pool_data_l_reads – pool_data_p_reads)/pool_data_l_reads – As usual, physical reads come from disk, logical reads from the bufferpool (in pureScale, either this means either the LBP or GBP) e.g., pool_data_l_reads = pool_data_lbp_pages_found + pool_data_gbp_l_reads New metrics in pureScale support breaking this down by LBP & GBP amounts – pool_data_lbp_pages_found = logical data reads satisfied by the LBP • i.e., we needed a page, and it was present & valid in the LBP – pool_data_gbp_l_reads = logical data reads attempted at the GBP • i.e., either not present or not valid in the LBP, so we needed to go to the GBP – pool_data_gbp_p_reads = physical data reads due to page not present in either the LBP or GBP • Essentially the same as non-pureScale pool_data_p_reads – pool_data_gbp_invalid_pages = number of GBP data read attempts due to an LBP page being present but marked invalid 6 • An indicator of the rate of GBP updates & their impact on the LBP Copyright IBM 2010
  • 7. pureScale bufferpool monitoring Overall (and non-pureScale) hit ratio – (pool_data_l_reads – pool_data_p_reads)/pool_data_l_reads – Great values: 95% for index, 90% for data – Good values: 80-90% for index, 75-85% for data LBP hit ratio – (pool_data_lbp_pages_found / pool_data_l_reads) * 100% – Generally lower than the overall hit ratio, since it excludes GBP hits – Factors which may affect it, other than LBP size • Increases with greater portion of read activity in the system – Decreasing probability that LBP copies of the page have been invalidated • May decrease with cluster size – Increasing probability that another member has invalidated the LBP page GBP hit ratio – (pool_data_gbp_l_reads – pool_data_gbp_p_reads) / pool_data_gbp_l_reads – A hit here is a read of a previously modified page, so hit ratios are typically quite low • An overall (LBP+GBP) H/R in the high 90's can correspond to a GBP H/R in the low 80's – Factors which may affect it, other than GBP size • Decreases with greater portion of read activity 7 Copyright IBM 2010
  • 8. pureScale bufferpool tuning Step 1: typical rule-of-thumb for GBP size = 35-40% of Σ( all members’ LBP sizes ) e.g. 4 members, LBP size of 1M pages each -> GBP size of 1.4 to 1.6M pages NB - don't forget, GBP page size is always 4kB, no matter what the LBP page size is. – If your workload very read-heavy (e.g. 90% read), initial GBP allocation could be in the 20-30% range – For 2-member clusters, you may want to start with 40-50% of total LBP, vs. 35-40% Step 2: monitor the overall BP hit ratio as usual, with pool_data_l_reads, pool_data_p_reads, etc. – Meets your goals? If yes, then done! Step 3: check LBP H/R with pool_data_lbp_pages_found/pool_data_l_reads – Great values: 90% for index, 85% for data – Good values: 70-80% for index, 65-80% for data – Increasing LBP size can help increase LBP H/R – NB – for each 16 extra LBP pages, the GBP needs 1 extra page for registrations Step 4: check GBP H/R with pool_data_gbp_l_reads, pool_data_gbp_p_reads, etc. – Great values: 90% for index, 80% for data – Good values: 65-80% for index, 60-75% for data – pool_data_l_reads > 10 x pool_data_gbp_l_reads means low GBP dependence – may mean tuning GBP size in this case is less valuable – pool_data_gbp_invalid_pages > 25% of pool_data_gbp_l_reads means GBP is really helping out, and could benefit from extra pages 8 Copyright IBM 2010
  • 9. pureScale architectural features for optimum performance Page lock negotiation – or Psst! Hey buddy, can you pass me that page? – pureScale page locks are physical locks, indicating which member currently ‘owns’ the page. Picture the following: • Member A : acquires a page P and modifies a row on it, and continues with its transaction. ‘A’ holds an exclusive page lock on page P until ‘A’ commits • Member B : wants to modify a different row on the same page P. What now? P ! GLM P? P P Log P Log P Px :A B Member A CF Member B – ‘B’ doesn’t have to wait until ‘A’ commits & releases the page lock • The CF will negotiate the page back from ‘A’ in the middle of ‘A’s transaction, on ‘B’s behalf • Provides far better concurrency & performance than needing to wait for a page lock until the holder commits. 9 Copyright IBM 2010
  • 10. pureScale architectural features for optimum performance Table append cache and index page cache – What happens in the case of rapid inserts into a single table by multiple members? Or rapid index updates? Will it cause the insert page to ‘thrash’ back & forth between the members, each time one has a new row? – No - each member sets aside an extent for insertion into the table to eliminate contention & page thrashing. Similarly for indexes with the page cache Lock avoidance – pureScale exploits cursor stability (CS) locking semantics to avoid taking locks in many common cases – Reduces pathlength and saves trips to the CF – Transparent & always on 10 Copyright IBM 2010
  • 11. Notes on storage configuration for performance GPFS best practices – Automatically configured by db2cluster command • Blocksize >= 1 MB (vs. default 64k) provides noticeably improved performance • Direct (unbuffered) IO for both logs & tablespace containers • SCSI-3 P/R on AIX enables faster disk takeover on member failure – Separate paths for logs & tablespaces are recommended Dominant storage performance factor for pureScale: fast log writes – Always important in OLTP – Extra important in pureScale due to log flushes driven by page reclaims – Separate filesystems, separate devices from each other & from tablespaces – Ideally – comfortably under 1ms – Possibly even SSDs to keep write latencies as low as possible 11 Copyright IBM 2010
  • 12. 12 Member Scalability Example Clients (2-way x345) Moderately heavy transaction processing 1Gb Ethernet workload modeling warehouse & ordering p550 members Client Connectivity process p550 Cluster Caching Facility – Write transactions rate 20% – Typical read/write ratio of many OLTP workloads 20Gb IB No cluster awareness in the app pureScale – No affinity Interconnect 7874-024 – No partitioning Switch – No routing of transactions to members Configuration Two 4Gb FC Switches – Twelve 8-core p550 members, 64 GB, 5 GHz – IBM 20Gb/s IB HCAs + 7874-024 IB Switch – Duplexed PowerHA pureScale across 2 additional 8-core p550s, 64 GB, 5 GHz – DS8300 storage 576 15K disks, Two 4Gb FC Switches DS8300 Storage 12 Copyright IBM 2010
  • 13. 12 Member Scalability Example - Results 12 10.4x @ 12 members 11 Throughput vs. 1 member 10 9 8 7.6x @ 8 members 7 6 5 4 3.9x @ 4 members 3 2 1.98x @ 2 members 1 0 0 5 10 15 # Members 13 Copyright IBM 2010
  • 14. DB2 pureScale Architecture Scalability How far will it scale? Take a web commerce type workload – Read mostly but not read only – about 90/10 Don’t make the application cluster aware – No routing of transactions to members – Demonstrate transparent application scaling Scale out to the 128 member limit and measure scalability 14 Copyright IBM 2010
  • 15. The 128-member result 128 Members 84% Scalability 112 Members 89% Scalability 88 Members 90% Scalability 2, 4 and 8 Members Over 64 Members 95% Scalability 95% Scalability 32 Members Over 95% 16 Members Scalability Over 95% Scalability 15 Copyright IBM 2010
  • 16. Summary Performance & scalability are two top goals of pureScale – many architectural features were designed solely to drive the best possible performance Monitoring and tuning for pureScale extends existing DB2 interfaces and practices – e.g., techniques for optimizing GBP/LBP configuration builds on steps already familiar to DB2 DBAs. The pureScale architecture exploits leading-edge low latency interconnects and RDMA, to achieve excellent performance & scalability – Initial 12- & 128-member proofpoints are strong evidence of a successful first release, with even better things to come! 16 Copyright IBM 2010
  • 17. Questions 17 Copyright IBM 2010