• Like

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Episode 4 DB2 pureScale Performance Webinar Oct 2010

  • 1,155 views
Published

Slides from Episode 4 of the DB2 pureScale webcast series with the IBM lab team.

Slides from Episode 4 of the DB2 pureScale webcast series with the IBM lab team.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,155
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
52
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. DB2 pureScale Performance Steve Rees srees@ca.ibm.com Oct 19, 2010
  • 2. Agenda DB2 pureScale technology review RDMA and low-latency interconnect Monitoring and tuning bufferpools in pureScale Architectural features for top performance Performance metrics 2 Copyright IBM 2010
  • 3. DB2 pureScale : Technology Review Leverage IBM’s System z Sysplex Experience and Know-How Clients connect anywhere,… Clients … see single database – Clients connect into any member – Automatic load balancing and client reroute may change underlying physical member to which client is connected Single Database View DB2 engine runs on several host computers – Co-operate with each other to provide coherent access to the database from any member Integrated cluster services Member Member Member Member – Failure detection, recovery automation, cluster file system – In partnership with STG (GPFS,RSCT) and Tivoli (SA MP) CS CS CS CS Low latency, high speed interconnect – Special optimizations provide significant advantages on RDMA- capable interconnects like Infiniband Cluster Interconnect Cluster Caching Facility (CF) technology from STG CS CS – Efficient global locking and buffer management – Synchronous duplexing to secondary ensures availability 2nd-ary Log Log Log Log Primary Shared Storage Access Data sharing architecture – Shared access to database – Members write to their own logs Database – Logs accessible from another host (used during recovery) 3 Copyright IBM 2010
  • 4. DB2 pureScale and low-latency interconnect Infiniband & uDAPL provide the low-latency RDMA infrastructure exploited by pureScale pureScale currently uses DDR and QDR IB adapters according to platform – Peak throughput of about 2-4 M messages per second – Provide message latencies in the 10s of microseconds or even lower Infiniband Roadmap from www.infinibandta.org The Infiniband development roadmap indicates continued increases in bit rates 4 Copyright IBM 2010
  • 5. Two-level page buffering – data consistency & improved performance The local bufferpool (LBP) caches both read-only and updated pages for that member The shared GBP contains references to every page CF in all LBPs across the cluster – References ensure consistency across 3 2 1 members – who’s interested in which pages, in case the pages are updated #5 60µs The GBP also contains copies of all updated pages #4 from the LBPs #3 – Sent from the LBP at transaction commit time 30µs 30 #1 10µs – Stored in the GBP & available to members on µs demand – 30 µs page read request over Infiniband from the GBP can be more than 100x faster than reading from disk Statistics are kept for tuning – Found in LBP vs. found in GBP vs. read M1 M2 M3 from disk – Useful in tuning GBP / LBP sizes Expensive disk #2 reads from M1, M2 not required – get the modified 5000µs page from the CF 5 Copyright IBM 2010
  • 6. pureScale bufferpool monitoring and tuning Of course, there are index ones too Familiar DB2 hit ratio calculations are useful with pureScale – HR = (logical reads – physical reads) / logical reads e.g. (pool_data_l_reads – pool_data_p_reads)/pool_data_l_reads – As usual, physical reads come from disk, logical reads from the bufferpool (in pureScale, either this means either the LBP or GBP) e.g., pool_data_l_reads = pool_data_lbp_pages_found + pool_data_gbp_l_reads New metrics in pureScale support breaking this down by LBP & GBP amounts – pool_data_lbp_pages_found = logical data reads satisfied by the LBP • i.e., we needed a page, and it was present & valid in the LBP – pool_data_gbp_l_reads = logical data reads attempted at the GBP • i.e., either not present or not valid in the LBP, so we needed to go to the GBP – pool_data_gbp_p_reads = physical data reads due to page not present in either the LBP or GBP • Essentially the same as non-pureScale pool_data_p_reads – pool_data_gbp_invalid_pages = number of GBP data read attempts due to an LBP page being present but marked invalid 6 • An indicator of the rate of GBP updates & their impact on the LBP Copyright IBM 2010
  • 7. pureScale bufferpool monitoring Overall (and non-pureScale) hit ratio – (pool_data_l_reads – pool_data_p_reads)/pool_data_l_reads – Great values: 95% for index, 90% for data – Good values: 80-90% for index, 75-85% for data LBP hit ratio – (pool_data_lbp_pages_found / pool_data_l_reads) * 100% – Generally lower than the overall hit ratio, since it excludes GBP hits – Factors which may affect it, other than LBP size • Increases with greater portion of read activity in the system – Decreasing probability that LBP copies of the page have been invalidated • May decrease with cluster size – Increasing probability that another member has invalidated the LBP page GBP hit ratio – (pool_data_gbp_l_reads – pool_data_gbp_p_reads) / pool_data_gbp_l_reads – A hit here is a read of a previously modified page, so hit ratios are typically quite low • An overall (LBP+GBP) H/R in the high 90's can correspond to a GBP H/R in the low 80's – Factors which may affect it, other than GBP size • Decreases with greater portion of read activity 7 Copyright IBM 2010
  • 8. pureScale bufferpool tuning Step 1: typical rule-of-thumb for GBP size = 35-40% of Σ( all members’ LBP sizes ) e.g. 4 members, LBP size of 1M pages each -> GBP size of 1.4 to 1.6M pages NB - don't forget, GBP page size is always 4kB, no matter what the LBP page size is. – If your workload very read-heavy (e.g. 90% read), initial GBP allocation could be in the 20-30% range – For 2-member clusters, you may want to start with 40-50% of total LBP, vs. 35-40% Step 2: monitor the overall BP hit ratio as usual, with pool_data_l_reads, pool_data_p_reads, etc. – Meets your goals? If yes, then done! Step 3: check LBP H/R with pool_data_lbp_pages_found/pool_data_l_reads – Great values: 90% for index, 85% for data – Good values: 70-80% for index, 65-80% for data – Increasing LBP size can help increase LBP H/R – NB – for each 16 extra LBP pages, the GBP needs 1 extra page for registrations Step 4: check GBP H/R with pool_data_gbp_l_reads, pool_data_gbp_p_reads, etc. – Great values: 90% for index, 80% for data – Good values: 65-80% for index, 60-75% for data – pool_data_l_reads > 10 x pool_data_gbp_l_reads means low GBP dependence – may mean tuning GBP size in this case is less valuable – pool_data_gbp_invalid_pages > 25% of pool_data_gbp_l_reads means GBP is really helping out, and could benefit from extra pages 8 Copyright IBM 2010
  • 9. pureScale architectural features for optimum performance Page lock negotiation – or Psst! Hey buddy, can you pass me that page? – pureScale page locks are physical locks, indicating which member currently ‘owns’ the page. Picture the following: • Member A : acquires a page P and modifies a row on it, and continues with its transaction. ‘A’ holds an exclusive page lock on page P until ‘A’ commits • Member B : wants to modify a different row on the same page P. What now? P ! GLM P? P P Log P Log P Px :A B Member A CF Member B – ‘B’ doesn’t have to wait until ‘A’ commits & releases the page lock • The CF will negotiate the page back from ‘A’ in the middle of ‘A’s transaction, on ‘B’s behalf • Provides far better concurrency & performance than needing to wait for a page lock until the holder commits. 9 Copyright IBM 2010
  • 10. pureScale architectural features for optimum performance Table append cache and index page cache – What happens in the case of rapid inserts into a single table by multiple members? Or rapid index updates? Will it cause the insert page to ‘thrash’ back & forth between the members, each time one has a new row? – No - each member sets aside an extent for insertion into the table to eliminate contention & page thrashing. Similarly for indexes with the page cache Lock avoidance – pureScale exploits cursor stability (CS) locking semantics to avoid taking locks in many common cases – Reduces pathlength and saves trips to the CF – Transparent & always on 10 Copyright IBM 2010
  • 11. Notes on storage configuration for performance GPFS best practices – Automatically configured by db2cluster command • Blocksize >= 1 MB (vs. default 64k) provides noticeably improved performance • Direct (unbuffered) IO for both logs & tablespace containers • SCSI-3 P/R on AIX enables faster disk takeover on member failure – Separate paths for logs & tablespaces are recommended Dominant storage performance factor for pureScale: fast log writes – Always important in OLTP – Extra important in pureScale due to log flushes driven by page reclaims – Separate filesystems, separate devices from each other & from tablespaces – Ideally – comfortably under 1ms – Possibly even SSDs to keep write latencies as low as possible 11 Copyright IBM 2010
  • 12. 12 Member Scalability Example Clients (2-way x345) Moderately heavy transaction processing 1Gb Ethernet workload modeling warehouse & ordering p550 members Client Connectivity process p550 Cluster Caching Facility – Write transactions rate 20% – Typical read/write ratio of many OLTP workloads 20Gb IB No cluster awareness in the app pureScale – No affinity Interconnect 7874-024 – No partitioning Switch – No routing of transactions to members Configuration Two 4Gb FC Switches – Twelve 8-core p550 members, 64 GB, 5 GHz – IBM 20Gb/s IB HCAs + 7874-024 IB Switch – Duplexed PowerHA pureScale across 2 additional 8-core p550s, 64 GB, 5 GHz – DS8300 storage 576 15K disks, Two 4Gb FC Switches DS8300 Storage 12 Copyright IBM 2010
  • 13. 12 Member Scalability Example - Results 12 10.4x @ 12 members 11 Throughput vs. 1 member 10 9 8 7.6x @ 8 members 7 6 5 4 3.9x @ 4 members 3 2 1.98x @ 2 members 1 0 0 5 10 15 # Members 13 Copyright IBM 2010
  • 14. DB2 pureScale Architecture Scalability How far will it scale? Take a web commerce type workload – Read mostly but not read only – about 90/10 Don’t make the application cluster aware – No routing of transactions to members – Demonstrate transparent application scaling Scale out to the 128 member limit and measure scalability 14 Copyright IBM 2010
  • 15. The 128-member result 128 Members 84% Scalability 112 Members 89% Scalability 88 Members 90% Scalability 2, 4 and 8 Members Over 64 Members 95% Scalability 95% Scalability 32 Members Over 95% 16 Members Scalability Over 95% Scalability 15 Copyright IBM 2010
  • 16. Summary Performance & scalability are two top goals of pureScale – many architectural features were designed solely to drive the best possible performance Monitoring and tuning for pureScale extends existing DB2 interfaces and practices – e.g., techniques for optimizing GBP/LBP configuration builds on steps already familiar to DB2 DBAs. The pureScale architecture exploits leading-edge low latency interconnects and RDMA, to achieve excellent performance & scalability – Initial 12- & 128-member proofpoints are strong evidence of a successful first release, with even better things to come! 16 Copyright IBM 2010
  • 17. Questions 17 Copyright IBM 2010