Showdown:  DB2 vs. Oracle Database for OLTP   Conor O’Mahony Email: [email_address] Twitter: conor_omahony Blog: db2news.w...
Agenda <ul><li>What Improves OLTP Performance? </li></ul><ul><li>Comparing OLTP Performance </li></ul><ul><li>Comparing Cl...
Technology for OLTP Performance <ul><li>Efficient I/O </li></ul><ul><ul><li>High performing data access </li></ul></ul><ul...
Efficient I/O <ul><li>Logging is the most critical factor in high volume OLTP </li></ul><ul><li>TPC-C proves that DB2 has ...
Large Memory and Efficient Memory Usage <ul><li>TPC-C benchmark with DB2 on p5 595 used 2TB of real memory </li></ul><ul><...
User Scalability <ul><li>DB2 has a proven ability to support lots of users </li></ul><ul><li>Supports two connection metho...
Agenda <ul><li>What Improves OLTP Performance? </li></ul><ul><li>Comparing OLTP Performance </li></ul><ul><li>Comparing Cl...
Transactional Performance <ul><li>Two large-scale standardized performance benchmarks for transactional workloads are ofte...
Longevity in TPC-C Performance Results as of April 21, 2008
Apples-to-Apples Comparison <ul><li>Both on exactly the same server (8way 1.9GHz p5 570) </li></ul><ul><li>DB2 v8 vs. Orac...
Longevity in SAP 3-Tier SD Performance Results as of Jan 8, 2008
SAP SD 3-tier <ul><li>This benchmark represents a  3-tier SAP R/3 environment </li></ul><ul><ul><li>Database on separate s...
2-tier SAP SD Benchmarks <ul><li>DB2 on IBM Power leads Oracle on HP by 18% on ½ the number of cores </li></ul><ul><li>DB2...
Agenda <ul><li>What Improves OLTP Performance? </li></ul><ul><li>Comparing OLTP Performance </li></ul><ul><li>Comparing Cl...
Oracle RAC - Single Instance Wants to Read a Page <ul><li>Process on Instance 1 wants to read page 501 mastered by instanc...
What Happens in DB2 pureScale to Read a Page <ul><li>Agent on Member 1 wants to read page 501 </li></ul><ul><ul><li>db2age...
Oracle RAC - Single Instance Wants to Read a Page <ul><li>Process on Instance 1 wants to read page 501 mastered by instanc...
What Happens in DB2 pureScale to Read a Page <ul><li>Agent on Member 1 wants to read page 501 </li></ul><ul><ul><li>db2age...
The Advantage of DB2 Read and Register with RDMA <ul><li>DB2 agent on Member 1 writes directly into CF memory with: </li><...
Transparent Application Scalability <ul><li>Scalability without application or database partitioning </li></ul><ul><ul><li...
2, 4 and 8 Members Over 95% Scalability Scalability for OLTP Applications 64 Members 95% Scalability 16 Members Over 95% S...
Agenda <ul><li>What Improves OLTP Performance? </li></ul><ul><li>Comparing OLTP Performance </li></ul><ul><li>Comparing Cl...
Online Recovery <ul><li>DB2 pureScale design point is to  maximize availability during failure  recovery processing </li><...
Steps Involved in DB2 pureScale Member Failure <ul><li>Failure Detection </li></ul><ul><li>Recovery process pulls directly...
Failure Detection for Failed Member <ul><li>DB2 has a watchdog process to monitor itself for software failure </li></ul><u...
Member Failure Summary DB2 Single Database View CF Shared Data <ul><li>Member Failure </li></ul><ul><li>DB2 Cluster Servic...
Steps involved in a RAC node failure <ul><li>Node failure detection </li></ul><ul><li>Data block remastering </li></ul><ul...
With RAC – Access to GRD and Disks are Frozen   <ul><li>Global Resource Directory (GRD) Redistribution </li></ul>No more I...
With RAC – Pages that Need Recovery are Locked GRD GRD Instance 1 fails I/O Requests  are Frozen Instance  1 Instance  2 I...
DB2 pureScale – No Freeze at All   Member 1 fails Member 1 Member 2 Member 3 No I/O  Freeze CF knows what rows on these pa...
Agenda <ul><li>What Improves OLTP Performance? </li></ul><ul><li>Comparing Cluster Scalability </li></ul><ul><li>Comparing...
Sample of Feedback… <ul><li>“ We compared DB2 with Oracle and found that DB2 was more reliable and has a better price-perf...
Thank You! <ul><li>For more information, see  www.ibm.com/db2 </li></ul><ul><li>Or contact me at: </li></ul>Conor O’Mahony...
Upcoming SlideShare
Loading in...5
×

Showdown: IBM DB2 versus Oracle Database for OLTP

10,220

Published on

A comparison of IBM DB2 and Oracle Database for OnLine Transaction Processing (OLTP) environments.

Published in: Technology
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
10,220
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
0
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide
  • What are the most important features in an RDBMS for OLTP transactions? In order to deliver very high throughput levels for transactional systems, the RDBMS must be able to efficiently perform I/O operations without holding up the transaction. It must be able to utilize memory more efficiently and must also be able to effectively handle large numbers of users. These three critical areas enable a database server to deliver very high levels of performance. We will explore each of these three areas in detail on the following 3 slides.
  • In very high volume transactional systems the logger can quickly become the bottleneck. DB2 (and other RDBMSs) can do most of the work a transaction requires completely in memory (updates occur in the buffer pool in memory, authorizations and access plans are all cached in memory, etc.). However there is one thing that cannot happen in memory. Whenever a transaction performs a COMMIT the information that tells the RDBMS how to redo that transaction (i.e. the log information) must be flushed to disk. If the committed transaction is not recorded in the log files on disk then it would be possible to lose committed transactions. Therefore all database servers write records to the log files whenever a user commits a transaction. In order to achieve very high concurrency and high throughput, it is essential that the logger be as efficient as possible since these I/Os can quickly become the bottleneck (since disk access is significantly slower than memory access). There is a very strong proof point that demonstrates DB2 has a much more efficient logger than our competitors. TPC-C is an industry standard transaction processing benchmark that all major database vendors participate in. Each vendor runs their own benchmarks to try to demonstrate they have the best RDBMS. DB2 comes out on top more often than any other database vendor (but more on that later). One interesting thing to note about TPC-C is that one of the requirements of the benchmark is that you also publish how much log space you consume during the benchmark run. By comparing the log space consumed (and knowing that this standard benchmark requires every vendor to run the exact same transactions over and over again) we can compare the efficiency of the 3 database vendor’s loggers. The most current TPC-C results (as of March 18, 2008) are shown on this chart. You can see that for each transaction (standard TPC-C transaction) DB2 produced 2.4 KB of log. Oracle’s top result is with 10gR2 (no 11g top result as of 3/18/2008) consumes 2x that much log space meaning that their the DB2 logger is twice as efficient and therefore can deliver higher levels of throughput. Oracle ran a TPC-C benchmark with RAC and consumed 20x more space than DB2. Ask Oracle why RAC consumes so much log space for the same transactions! Microsoft SQL Server 2005 result, was even worse than Oracle consuming more than 2.5x that of DB2. These benchmarks are the most highly tuned database systems available (tuned by database vendor benchmark experts from the vendor). This reduction in logging is one of the reasons why DB2 delivers better OLTP performance compared to Oracle and Microsoft.
  • Efficient use of memory is also critical for high volume transaction processing. Given a limited amount of physical memory, you want your database to utilize it to the fullest in order to improve throughput of your system. DB2 has two unique advantages over Oracle in this area. The first is that DB2 allows for multiple buffer pools. In Oracle you can have only one buffer pool per page size (i.e. 1 4KB pool, 1 8KB pool, 1 16KB pool and 1 32KB pool). This can severely limit your ability to effectively utilize the memory on the server to tune the system for optimal performance. DB2 however, allows for as many buffer pools of any page size that you like. For example you can have 4 buffer pools of 4KB and another 5 buffer pools of 8KB, etc. You can choose the buffer pool configuration that best suites your transaction processing needs. As an example, on a server with 2TB of real memory in a TPC-C benchmark DB2 allocated several buffer pools of different page sizes whose total size in aggregate was 1.9TB. Now with the new threaded engine in DB2 9.5 there is even more advantage over Oracle. By using threads rather than processes for user connections, the amount of memory consumed per connection is significantly lower. This allows more user connections for a given amount of memory and leaves more memory available to other areas of DB2 (like the buffer pool). This better memory utilization results again in higher throughput and better performance. Later in this presentation we will talk about Self Tuning Memory Manager (STMM) which will show that not only does DB2 better exploit memory for higher performance, but it does so with less administration required.
  • The final area that is critical to high transaction performance is the ability to support large numbers of concurrent users. Both DB2 and Oracle have the ability to do connection concentration to reduce memory requirements on the server. However, only DB2 has the threaded engine mentioned on the previous slide. This enables DB2 to scale higher than Oracle on the same server with the same amount of memory and therefore deliver higher throughput.
  • There are several transaction processing benchmarks that demonstrate DB2’s performance leadership over Oracle. The first is TPC-C which is an industry standard transaction processing benchmark. SAP Sales and Distribution (SD) is also a widely used performance benchmark which simulates real world SAP transactions. The third transaction processing benchmark is called SPECjAppServer which measures the performance of a web based java application on the database system. We will discuss each of these benchmarks on the following slides.
  • Benchmarks are often a leapfrog game where on any given day, one database vendor can be in front of the rest if they run on some newly announced hardware or the latest software versions. This chart represents days of leadership for TPC-C since Jan 1, 2003 through April 21, 2008. It measures how long each of the vendors have held the top spot in TPC-C. Over this 5 year period of time, DB2 has been in a leadership position almost 2x longer than Oracle and in fact has lead longer than all other database vendors combined.
  • It is not very often that you get an Apples to Apples comparison where two database vendors run their benchmarks on the exact same hardware. This result is slightly dated (using DB2 v8 against Oracle 10g) however it shows that on exactly the same hardware, DB2 delivered 16% better performance than Oracle. In fact you would need 10 CPUs of Oracle to match the performance of 8 CPUs of DB2 on this class of server.
  • In fact, Oracle has rarely been able to challenge DB2 over the past 5 years on the SAP SD 3-tier benchmark. This chart represents days of leadership for SAP SD 3-tier since Jan 1, 2003. As you can see, DB2 has held the lead over the last 5 years 8 times longer than Oracle (the only other competitor to lead in this timeframe) .
  • This result shows the top SAP SD 3-tier benchmark results as of March 18, 2008. SAP Sales and Distribution 3-tier represents a configuration where the database software is running on it’s own server hardware and there are several SAP application servers in the middle tier. This is the configuration that most enterprise customers would run their SAP workloads on and DB2 has demonstrated clear performance leadership in this area.
  • On the SAP SD 2-tier benchmark DB2 leads Oracle by 18% using half the number of processor cores. On April 8, 2008 DB2 9.5 running on a 64 core IBM Power 595 with AIX 6.1 delivered 35,400 SD Users. Oracle’s top result is 30,000 SD users with 10g running on a 128 core HP Integrity Superdome with HP-UX.
  • A server process that wants to access a data page, for example page 501, will first check to see if that page is in its local buffer pool (step  ). If this page is not found, the server process will send an inter-process communication (IPC) request to a GCS process in order to ask the master node for that data page (  ). This results in the server process yielding the CPU and the CPU performing a context switch to potentially re-establish the GCS process on the CPU to process the interrupt. High levels of context switching can be very costly to perform. The GCS process then sends an IP request to the master node for the data block being requested (  ). Because IP calls are processed in the operating system kernel, the GCS process has to copy the requested information into kernel memory and then execute expensive IP stack calls to push the request to the remote node. Even if an InfiniBand network is being used, Oracle still uses IP over InfiniBand or in some cases Reliable Datagram Sockets (RDS). Use of a socket protocol even over InfiniBand is costly due to processor interrupts, IP stack traversal, etc. Next, the remote master GCS process will receive an interrupt and will be scheduled on the CPU to process the request. It will check to see if any other members have the page in its buffer cache. In this example, no member has the page so the GCS process will send an IP message back to the requester telling it to read the page from disk (  ). The GCS processes on the requesting node will be interrupted again to process the incoming IP request, and will in turn send an IPC interrupt (  ) to the server process to inform it that no other node in the cluster has the page. The server process will then read the page from disk into its own buffer cache (  ).
  • This slide illustrates the advantage of DB2 pureScale for very efficient access to data. A comparison to Oracle RAC will follow in the next section. The steps listed above show you how DB2 pureScale communicates with the CF to declare its intent to access a data page. Steps 2 and 3 are the critical success factors to DB2 pureScale efficiency. That is, when there is a need to communicate with the centralized CF, that communication uses RDMA. Essentially, the process on member 1 writes directly into the memory of the CF with its request. This is done without going through the IP socket stack, without context switching and in many cases without having to yield the CPU (the round trip communication time between the two servers can be as little as 15 microseconds).
  • A server process that wants to access a data page, for example page 501, will first check to see if that page is in its local buffer pool (step  ). If this page is not found, the server process will send an inter-process communication (IPC) request to a GCS process in order to ask the master node for that data page (  ). This results in the server process yielding the CPU and the CPU performing a context switch to potentially re-establish the GCS process on the CPU to process the interrupt. High levels of context switching can be very costly to perform. The GCS process then sends an IP request to the master node for the data block being requested (  ). Because IP calls are processed in the operating system kernel, the GCS process has to copy the requested information into kernel memory and then execute expensive IP stack calls to push the request to the remote node. Even if an InfiniBand network is being used, Oracle still uses IP over InfiniBand or in some cases Reliable Datagram Sockets (RDS). Use of a socket protocol even over InfiniBand is costly due to processor interrupts, IP stack traversal, etc. Next, the remote master GCS process will receive an interrupt and will be scheduled on the CPU to process the request. It will check to see if any other members have the page in its buffer cache. In this example, no member has the page so the GCS process will send an IP message back to the requester telling it to read the page from disk (  ). The GCS processes on the requesting node will be interrupted again to process the incoming IP request, and will in turn send an IPC interrupt (  ) to the server process to inform it that no other node in the cluster has the page. The server process will then read the page from disk into its own buffer cache (  ).
  • This slide illustrates the advantage of DB2 pureScale for very efficient access to data. A comparison to Oracle RAC will follow in the next section. The steps listed above show you how DB2 pureScale communicates with the CF to declare its intent to access a data page. Steps 2 and 3 are the critical success factors to DB2 pureScale efficiency. That is, when there is a need to communicate with the centralized CF, that communication uses RDMA. Essentially, the process on member 1 writes directly into the memory of the CF with its request. This is done without going through the IP socket stack, without context switching and in many cases without having to yield the CPU (the round trip communication time between the two servers can be as little as 15 microseconds).
  • To dive deeper into the “secret sauce” let’s look at exactly how the member communicates with the CF. If an agent on member 1 wants to read a page, that agent will write directly into the member of the CF telling the CF exactly what page it wants and even telling it what slot in its buffer pool that the page will go into on Member 1. If the CF does not have the page, it writes a message right into the memory of Member 1 to indicate that it doesn’t have the page. If the CF does have the page, it writes the data page directly into memory on Member 1 without any context switching or IP stack calls.
  • As previously mentioned, the critical success factor for scalability in an active-active cluster is to ensure that when a transaction requests a piece of data, it can get that data with the lowest amount of latency. With DB2 pureScale, by centralizing data that is of interest to more than one member in the cluster, and by accessing that data using RDMA in an interrupt free processing environment, you can see near linear scalability even out to dozens of nodes. More importantly, you do not need to design your application to be cluster aware. There is no need to route transactions that access the same data pages to a single node. In practice, this is not the case with Oracle RAC. There are many stories on the internet, and in published books on Oracle RAC that tell customers to avoid hot pages being passed between nodes by using one of the methods described in the last 4 bullets of the above slide. These methods require costly DBA and application developer interventions, as well as potential application rework as the size of the cluster changes.
  • To demonstrate the scalability of DB2 pureScale, the lab set up a configuration comprised of 128 members (note that for server consolidation environments it is possible to put multiple members on an SMP server). A workload was created where the read to write ratios are typically 90:10. As well, to prove the scalability of architecture, the application has no cluster awareness. In fact the application updates or selects a random row and therefore every row in the database will be touched by all members in the cluster (we did this to show that locality of data is not as essential for scaling as with other shared disk architectures) The results of this 128 member test show that there is near linear scaling even out to 128 members in the cluster. Up to 64 members in the cluster, the scalability (compared to the 1 member result) is still above 95% and at 128 members the scalability was at 84%. Note that this is a validation of the architecture and includes some capabilities under development that will not be in the December GA code.
  • The second key feature of DB2 pureScale is the high availability it provides. Again the secret to its success is the centralized locking and caching. When one member fails, all other members in the cluster can continue to process transactions. The only data that is unavailable are the actually pages that were being updated in flight when the member failed. And if those pages are hot then they will be in the CF memory which means the recovery of pages needed by other members will be very fast.
  • There are 3 things that occur during an instance failure at a high level Failure detection Pull pages that need to be fixed directly from CF memory Fix the pages In each of these steps, DB2 pureScale has been optimized with the goal of getting these pages fixed and having them accessible in under 20 seconds (all the while the rest of the data in the database is completely available).
  • Failure detection was a large part of the investment that went into DB2 pureScale. Software failure in a DB2 pureScale environment has been architected to be caught in a fraction of a second and to begin the driving of recovery processing within that second. Hardware failure is a more difficult challenge, but thanks to some innovative techniques, DB2 pureScale has built in a set of algorithms that can detect node failures in as little as 3 seconds without false failovers. When we talk about having the rows available again within 20 seconds of a failure, we mean from the time the failure occurred, not the time the failure was detected. Other vendors may exclude this time to give better numbers but from an end user this time is critical and so we include it.
  • Here is a detailed walk through of what happens when a node fails. Run this in slide show mode to see the steps. Note that we call this process “Online Failover” because the other transactions on other nodes are not impacted in any way from processing (which is different than Oracle RAC as you will see in future slides. As well, the data that needs to be fixed will primarily be in memory on the CF so it will be working at memory speeds. In the event of a hardware failure, we take the additional step of automatically fencing off the storage access from the failed member to prevent split brain issues.
  • In Oracle there are a similar set of steps to recover from a filed instance. However the middle two are where things are very different when compared to DB2 pureScale: Node failure detection Global lock remastering Lock pages that need recovery Fix those pages The biggest difference is that with DB2 pureScale there is centralized locking so there is no need to remaster global locks. Also pureScale does not need to find the pages to lock (it is already aware of the pages that need to be fixed).
  • In Oracle RAC, each data page (called a data block in Oracle) is mastered by one of the instances in the cluster. Oracle employs a distributed locking mechanism and therefore each instance in the cluster is responsible for managing and granting lock requests for the pages that it masters. In the event of a node failure, the data pages for the failed node become momentarily orphaned while RAC goes through a lock redistribution process to assign new ownership of these orphaned pages to the surviving nodes in the cluster. This is called Global Resource Directory (GRD) reconfiguration and while this is occurring, any request to read a page, as well as any request to lock a page, is momentarily frozen. Applications can continue to process on the surviving nodes, however, during this time, they cannot perform any I/O operations or request any new locks. This results in many applications experiencing a freeze as shown in this slide.
  • The second step in the Oracle RAC node recovery process is to lock all the data pages that need recovery. This must be done before the GRD freeze described earlier is released. If an instance was allowed to read a page from disk before the appropriate page locks were acquired, the update from the failed instance could be lost. The recovery instance performs a first pass read of the redo log file from the failed instance and locks any pages that need recovery as shown in Figure 2. This may require a significant number of random I/O operations as the log file, and potentially the pages that need recovery, may not be in the memory of any of the surviving nodes. The GRD freeze is lifted and the stalled applications can continue processing only after all these I/O operations are performed by the recovery instance and the appropriate pages are locked. Depending on the amount of work that the failed node was doing at the time of the failure, this process can take from tens of seconds up to as much as a minute before it completes. This GRD freeze and the fact that I/O operations cannot be performed during this period or new lock requests granted, is documented in several published books on Oracle RAC.
  • In comparison, DB2 pureScale environments require no global freeze in the cluster. The CF is aware at all times which pages would need recovery should any member fail. If a member fails, all other members in the cluster can continue to run transactions and perform I/O operations. Only requests to access pages that need recovery will be blocked while the recovery process cleans up from the failed member as shown on this slide (and the process is likely to happen from memory).
  • Showdown: IBM DB2 versus Oracle Database for OLTP

    1. 1. Showdown: DB2 vs. Oracle Database for OLTP Conor O’Mahony Email: [email_address] Twitter: conor_omahony Blog: db2news.wordpress.com Conor O’Mahony Email: [email_address] Twitter: conor_omahony Blog: database-diary.com
    2. 2. Agenda <ul><li>What Improves OLTP Performance? </li></ul><ul><li>Comparing OLTP Performance </li></ul><ul><li>Comparing Cluster Scalability </li></ul><ul><li>Comparing Cluster Availability </li></ul><ul><li>What Users are Saying </li></ul>
    3. 3. Technology for OLTP Performance <ul><li>Efficient I/O </li></ul><ul><ul><li>High performing data access </li></ul></ul><ul><ul><li>Logger most critical choke point </li></ul></ul><ul><li>Large Memory Exploitation </li></ul><ul><ul><li>Multiple buffer pools </li></ul></ul><ul><li>Ability to handle lots of users </li></ul><ul><ul><li>Low memory footprint per user </li></ul></ul><ul><ul><li>Connection concentrator </li></ul></ul>
    4. 4. Efficient I/O <ul><li>Logging is the most critical factor in high volume OLTP </li></ul><ul><li>TPC-C proves that DB2 has a more efficient logger </li></ul><ul><ul><li>DB2 logs less than ½ of Oracle Database and SQL Server </li></ul></ul><ul><ul><ul><li>DB2 9 = 2.4KB of log per transaction </li></ul></ul></ul><ul><ul><ul><li>Oracle 10gR2 = 4.9KB of log per transaction </li></ul></ul></ul><ul><ul><ul><li>SQL Server 2005 = 6.0KB of log per transaction </li></ul></ul></ul><ul><li>Reduced logging = increased performance </li></ul>
    5. 5. Large Memory and Efficient Memory Usage <ul><li>TPC-C benchmark with DB2 on p5 595 used 2TB of real memory </li></ul><ul><ul><li>1.9TB of buffer pool space allocated by DB2 </li></ul></ul><ul><ul><ul><li>Using multiple buffer pools of different page sizes </li></ul></ul></ul><ul><li>DB2 is now threaded </li></ul><ul><ul><li>One flat memory address space for all of DB2 </li></ul></ul><ul><ul><li>No more private memory (all shared) </li></ul></ul><ul><ul><ul><li>Reduces per connection footprint by 1MB </li></ul></ul></ul><ul><li>More efficient memory access = better performance </li></ul>
    6. 6. User Scalability <ul><li>DB2 has a proven ability to support lots of users </li></ul><ul><li>Supports two connection methods: </li></ul><ul><ul><li>Each client has connection process on server </li></ul></ul><ul><ul><li>Server processes are shared amongst incoming requests </li></ul></ul><ul><li>DB2 has a lower memory footprint per agent (connection) than other vendors </li></ul><ul><ul><li>Enables more client connections for a given amount of memory </li></ul></ul><ul><li>Better user scalability = more throughput </li></ul>
    7. 7. Agenda <ul><li>What Improves OLTP Performance? </li></ul><ul><li>Comparing OLTP Performance </li></ul><ul><li>Comparing Cluster Scalability </li></ul><ul><li>Comparing Cluster Availability </li></ul><ul><li>What Users are Saying </li></ul>
    8. 8. Transactional Performance <ul><li>Two large-scale standardized performance benchmarks for transactional workloads are often used for comparisons </li></ul><ul><ul><li>TPC-C – industry standard OLTP benchmark </li></ul></ul><ul><ul><li>SAP SD – represents an SAP R/3 Sales and Distribution application </li></ul></ul>
    9. 9. Longevity in TPC-C Performance Results as of April 21, 2008
    10. 10. Apples-to-Apples Comparison <ul><li>Both on exactly the same server (8way 1.9GHz p5 570) </li></ul><ul><li>DB2 v8 vs. Oracle 10g </li></ul><ul><li>DB2 leads in performance by 16% over Oracle </li></ul>16% Faster Results current as of Feb 24, 2008 Check http://www.tpc.org for latest results
    11. 11. Longevity in SAP 3-Tier SD Performance Results as of Jan 8, 2008
    12. 12. SAP SD 3-tier <ul><li>This benchmark represents a 3-tier SAP R/3 environment </li></ul><ul><ul><li>Database on separate server </li></ul></ul><ul><li>DB2 outperforms Oracle by 68% with half the number of cores </li></ul><ul><ul><li>DB2 running on 32-way p5 595 </li></ul></ul><ul><ul><li>Oracle running on 64-way HP </li></ul></ul>
    13. 13. 2-tier SAP SD Benchmarks <ul><li>DB2 on IBM Power leads Oracle on HP by 18% on ½ the number of cores </li></ul><ul><li>DB2 on IBM Power delivers significantly more SAPS/Watt saving you money </li></ul>Results as of April 8, 2008
    14. 14. Agenda <ul><li>What Improves OLTP Performance? </li></ul><ul><li>Comparing OLTP Performance </li></ul><ul><li>Comparing Cluster Scalability </li></ul><ul><li>Comparing Cluster Availability </li></ul><ul><li>What Users are Saying </li></ul>
    15. 15. Oracle RAC - Single Instance Wants to Read a Page <ul><li>Process on Instance 1 wants to read page 501 mastered by instance 2 </li></ul><ul><ul><li>System process checks local buffer pool: page not found </li></ul></ul><ul><ul><li>System process sends an IPC to the Global Cache Service process to get page 501 </li></ul></ul><ul><ul><ul><li>Context Switch to schedule GCS on a CPU </li></ul></ul></ul><ul><ul><ul><li>GCS copies request to kernel memory to make TCP/IP stack call </li></ul></ul></ul><ul><ul><li>GCS sends request over to Instance 2 </li></ul></ul><ul><ul><ul><li>IP receive call requires interrupt processing on remote node </li></ul></ul></ul><ul><ul><li>Remote node responds back via IP interrupt to GCS on Instance 1 </li></ul></ul><ul><ul><li>GCS sends IPC to System process (another context switch to process request) </li></ul></ul><ul><ul><li>System process performs I/O to get the page </li></ul></ul>Buffer Cache Instance 2 Buffer Cache system 1 2 3 4 GCS GCS 5 6 501 501 Instance 1
    16. 16. What Happens in DB2 pureScale to Read a Page <ul><li>Agent on Member 1 wants to read page 501 </li></ul><ul><ul><li>db2agent checks local buffer pool: page not found </li></ul></ul><ul><ul><li>db2agent performs Read And Register (RaR) RDMA call directly into CF memory </li></ul></ul><ul><ul><ul><li>No context switching, no kernel calls. </li></ul></ul></ul><ul><ul><ul><li>Synchronous request to CF </li></ul></ul></ul><ul><ul><li>CF replies that it does not have the page (again via RDMA) </li></ul></ul><ul><ul><li>db2agent reads the page from disk </li></ul></ul>Group Buffer Pool CF Buffer Pool Member 1 db2agent 1 2 3 4 501 501 PowerHA pureScale
    17. 17. Oracle RAC - Single Instance Wants to Read a Page <ul><li>Process on Instance 1 wants to read page 501 mastered by instance 2 </li></ul><ul><ul><li>System process checks local buffer pool: page not found </li></ul></ul><ul><ul><li>System process sends an IPC to the Global Cache Service process to get page 501 </li></ul></ul><ul><ul><ul><li>Context Switch to schedule GCS on a CPU </li></ul></ul></ul><ul><ul><ul><li>GCS copies request to kernel memory to make TCP/IP stack call </li></ul></ul></ul><ul><ul><li>GCS sends request over to Instance 2 </li></ul></ul><ul><ul><ul><li>IP receive call requires interrupt processing on remote node </li></ul></ul></ul><ul><ul><li>Remote node responds back via IP interrupt to GCS on Instance 1 </li></ul></ul><ul><ul><li>GCS sends IPC to System process (another context switch to process request) </li></ul></ul><ul><ul><li>System process performs I/O to get the page </li></ul></ul>Buffer Cache Instance 2 Buffer Cache system 1 2 3 4 GCS GCS 5 6 501 501 Instance 1
    18. 18. What Happens in DB2 pureScale to Read a Page <ul><li>Agent on Member 1 wants to read page 501 </li></ul><ul><ul><li>db2agent checks local buffer pool: page not found </li></ul></ul><ul><ul><li>db2agent performs Read And Register (RaR) RDMA call directly into CF memory </li></ul></ul><ul><ul><ul><li>No context switching, no kernel calls. </li></ul></ul></ul><ul><ul><ul><li>Synchronous request to CF </li></ul></ul></ul><ul><ul><li>CF replies that it does not have the page (again via RDMA) </li></ul></ul><ul><ul><li>db2agent reads the page from disk </li></ul></ul>Group Buffer Pool CF Buffer Pool Member 1 db2agent 1 2 3 4 501 501 PowerHA pureScale
    19. 19. The Advantage of DB2 Read and Register with RDMA <ul><li>DB2 agent on Member 1 writes directly into CF memory with: </li></ul><ul><ul><li>Page number it wants to read </li></ul></ul><ul><ul><li>Buffer pool slot that it wants the page to go into </li></ul></ul><ul><li>CF either responds by writing directly into memory on Member 1: </li></ul><ul><ul><li>That it does not have the page or </li></ul></ul><ul><ul><li>With the requested page of data </li></ul></ul><ul><li>Total end to end time for RAR is measured in microseconds </li></ul><ul><li>Calls are very fast, the agent may even stay on the CPU for the response </li></ul>Much more scalable, does not require locality of data Direct remote memory write with request I don’t have it, get it from disk I want page 501. Put into slot 42 of my buffer pool. Direct remote memory write of response Member 1 CF 1, Eaton, 10210, SW 2, Smith, 10111, NE 3, Jones, 11251, NW db2agent CF thread
    20. 20. Transparent Application Scalability <ul><li>Scalability without application or database partitioning </li></ul><ul><ul><li>Centralized locking and real global buffer pool with RDMA access results in real scaling without making application cluster aware </li></ul></ul><ul><ul><ul><li>Sharing of data pages is via RDMA from a true shared cache </li></ul></ul></ul><ul><ul><ul><ul><li>not synchronized access via process interrupts between servers) </li></ul></ul></ul></ul><ul><ul><ul><li>No need to partition application or data for scalability </li></ul></ul></ul><ul><ul><ul><ul><li>Resulting in lower administration and application development costs </li></ul></ul></ul></ul><ul><ul><li>Distributed locking in RAC results in higher overhead and lower scalability </li></ul></ul><ul><ul><ul><li>Oracle RAC best practices recommends </li></ul></ul></ul><ul><ul><ul><ul><li>Fewer rows per page (to avoid hot pages) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Partition database to avoid hot pages </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Partition application to get some level of scalability </li></ul></ul></ul></ul><ul><ul><ul><ul><li>All of these result in higher management and development costs </li></ul></ul></ul></ul>
    21. 21. 2, 4 and 8 Members Over 95% Scalability Scalability for OLTP Applications 64 Members 95% Scalability 16 Members Over 95% Scalability 32 Members Over 95% Scalability 88 Members 90% Scalability 112 Members 89% Scalability 128 Members 84% Scalability
    22. 22. Agenda <ul><li>What Improves OLTP Performance? </li></ul><ul><li>Comparing OLTP Performance </li></ul><ul><li>Comparing Cluster Scalability </li></ul><ul><li>Comparing Cluster Availability </li></ul><ul><li>What Users are Saying </li></ul>
    23. 23. Online Recovery <ul><li>DB2 pureScale design point is to maximize availability during failure recovery processing </li></ul><ul><li>When a database member fails, only in-flight data remains locked until member recovery completes </li></ul><ul><ul><li>In-flight = data being updated on the failed member at the time it failed </li></ul></ul><ul><li>Target time to row availability </li></ul><ul><ul><li><20 seconds </li></ul></ul>Shared Data CF CF Log Log Log Log DB2 DB2 DB2 DB2 % of Data Available Time (~seconds) Only data in-flight updates locked during recovery Database member failure 100 50
    24. 24. Steps Involved in DB2 pureScale Member Failure <ul><li>Failure Detection </li></ul><ul><li>Recovery process pulls directly from CF: </li></ul><ul><ul><li>Pages that need to be fixed </li></ul></ul><ul><ul><li>Location of Log File to start recovery from </li></ul></ul><ul><li>Restart Light Instance performs redo and undo recovery </li></ul>
    25. 25. Failure Detection for Failed Member <ul><li>DB2 has a watchdog process to monitor itself for software failure </li></ul><ul><ul><li>The watchdog is signaled any time the DB2 member is dying </li></ul></ul><ul><ul><li>This watchdog will interrupt the cluster manager to tell it to start recovery </li></ul></ul><ul><ul><li>Software failure detection times are a fraction of a second </li></ul></ul><ul><li>The DB2 cluster manager performs very low level, sub second heart beating (with negligible impact on resource utilization) </li></ul><ul><ul><li>DB2 cluster manager performs other checks to determine congestion or failure </li></ul></ul><ul><ul><li>Result is hardware failure detection in under 3 seconds without false failovers </li></ul></ul>
    26. 26. Member Failure Summary DB2 Single Database View CF Shared Data <ul><li>Member Failure </li></ul><ul><li>DB2 Cluster Services automatically detects member’s death </li></ul><ul><ul><li>Inform other members, and CFs </li></ul></ul><ul><ul><li>Initiates automated member restart on same or remote host </li></ul></ul><ul><ul><li>Member restart is like crash recovery in a single system, but is much faster </li></ul></ul><ul><ul><ul><li>Redo limited to in-flight transactions </li></ul></ul></ul><ul><ul><ul><li>Benefits from page cache in CF </li></ul></ul></ul><ul><li>Client transparently re-routed to healthy members </li></ul><ul><li>Other members fully available at all times – “Online Failover” </li></ul><ul><ul><li>CF holds update locks held by failed member </li></ul></ul><ul><ul><li>Other members can continue to read and update data not locked by failed member </li></ul></ul><ul><li>Member restart completes </li></ul><ul><ul><li>Locks released and all data fully available </li></ul></ul>DB2 DB2 DB2 CF Updated Pages Global Locks Primary CF Secondary CF Updated Pages Global Locks Log CS CS Clients CS CS CS CS Log Log Log Log Records Pages kill -9
    27. 27. Steps involved in a RAC node failure <ul><li>Node failure detection </li></ul><ul><li>Data block remastering </li></ul><ul><li>Locking of pages that need recovery </li></ul><ul><li>Redo and undo recovery </li></ul>Unlike DB2 pureScale, Oracle RAC does not centralize lock or data cache
    28. 28. With RAC – Access to GRD and Disks are Frozen <ul><li>Global Resource Directory (GRD) Redistribution </li></ul>No more I/O until pages that need recovery are locked No Lock Updates GRD GRD Instance 1 fails Instance 1 Instance 2 Instance 3 I/O Requests are Frozen GRD
    29. 29. With RAC – Pages that Need Recovery are Locked GRD GRD Instance 1 fails I/O Requests are Frozen Instance 1 Instance 2 Instance 3 Recovery Instance reads log of failed node Recovery instance locks pages that need recovery x x x x x x redo log redo log redo log GRD Must read log and lock pages before freeze is lifted.
    30. 30. DB2 pureScale – No Freeze at All Member 1 fails Member 1 Member 2 Member 3 No I/O Freeze CF knows what rows on these pages had in-flight updates at time of failure x x x x x x x x x x x x CF always knows what changes are in flight CF Central Lock Manager
    31. 31. Agenda <ul><li>What Improves OLTP Performance? </li></ul><ul><li>Comparing Cluster Scalability </li></ul><ul><li>Comparing Cluster Availability </li></ul><ul><li>Comparing OLTP Performance </li></ul><ul><li>What Users are Saying </li></ul>
    32. 32. Sample of Feedback… <ul><li>“ We compared DB2 with Oracle and found that DB2 was more reliable and has a better price-performance ratio.” </li></ul><ul><ul><li>― Dr. Yuki Kitaoka, Kyoto National Hospital </li></ul></ul><ul><li>“ We chose DB2 over Oracle and Sybase for its proven performance, availability and cost-effectiveness.” </li></ul><ul><ul><li>― Atakan Karaman, Anadolu Group </li></ul></ul><ul><li>“ DB2 Universal Database is the bank’s database of choice for its high availability, scalability and performance.” </li></ul><ul><ul><li>― Patrick Stearn, Bayerische Landesbank </li></ul></ul><ul><li>“ In comparison tests with both Oracle and Microsoft, DB2 continually demonstrated a better price-to-performance ratio -- the quality of DB2 is quite astonishing.” </li></ul><ul><ul><li>― Benjamin Simmen, Zurich Financial Services </li></ul></ul><ul><li>“ When making our decision, we discovered that a $250,000 DB2/Linux solution performed better than a $1.5 million Oracle/Solaris solution” </li></ul><ul><ul><li>― Carl Ansley, CTO, Clarity Payment Solutions, Inc. </li></ul></ul>
    33. 33. Thank You! <ul><li>For more information, see www.ibm.com/db2 </li></ul><ul><li>Or contact me at: </li></ul>Conor O’Mahony Email: [email_address] Twitter: conor_omahony Blog: database-diary.com

    ×