Lets get started! My name is Gwen Shapira, I ’ m a senior consultant for Pythian. We are here to discuss the performance of the Oracle Database Appliance. I get two type of performance questions from companies considering ODA: I need to scale my application. Is ODA the answer? I ’ m planning to move to ODA for other reasons, how do I know I ’ ll still get the performance I need. This presentation will address these two questions and give you an idea of which applications and workloads are a good fit for ODA, and what kind of performance you can expect.
Alex Gorbachev, Pythian ’ s CTO and President of the RAC special interest group. He ran many of the tests and benchmarks that we ’ ll show in this presentation. I ’ m a senior consultant for Pythian with many years of RAC experience. I ran other benchmarks and will be presenting the results here. We are both Oracle ACE Directors and members of the Oak Table Network.
- Successful growing business for more than 10 years - Served many customers with complex requirements/infrastructure just like yours. - Operate globally for 24 x 7 “always awake” services
Enough about us – lets talk about ODA Simple and RAC did not use to appear in the same sentence. RAC is a complex system with many components and dependencies on storage and network. Setting up RAC system requires a lot of coordination between network admins, storage admins, sysadmins and DBAs. Its considered a large project and can take a long time (weeks) to get going right. ODA is intended to be plug-and-play solution. Get going relatively quickly (hours instead of days or weeks), with a pre-configured system it is more difficult to get things wrong. Doesn ’ t have to be RAC! One customer asked us if he can have a dataguard standby with primary on one node and standby on the other – not recommended, but definitely a possibility!
Interconnect and storage have big impact on performance
You see 24 disks here and various indicator lights. The upper row has the 4 SSD disks. If you need to replace a disk – this is where you do it.
On the left: power supply. 4 network port in two bonded interfaces for backups, DR. Two large ports below are 10gE public database interface. On the right panel: leftmost is the serial connector to console, then 2x1gE for public network, ethernet ILOM connection and USB+Video connectors. What you don ’ t see is the interconnect. There are two on-board integrated interconnect interfaces. Not bonded but used for redundancy.
This is the part that plugs into the back plane, with the interconnects and power supply.
When we do forklift migrations to Exadata (i.e. with no application changes), we are always impressed by the performance improvements. 10x improvement is not rare, its expected. Some of it is due to Exadata secret sauce (Mostly not included in ODA) But some is due to modern, well thought out hardware architecture that is pre-configured And some improvements are due to 11gR2 optimizations. With ODA you get two of the big Exadata benefits. Westmere cores, fastest you can get at 95W (easier to cool)
Each node has two HBAs, each connected to two expanders. Each expander is connected to 12 disks. Each disk has two ports so it has connectivity to both nodes. There are two paths from the node to each disk – through both HBAs, but you don ’ t need to configure multipathing or even know what it is. It is pre-configured. Another nice thing is the high availability – any component can (and will) fail without impacting the system availability. Pull out a disk, a cable, an entire node, shoot a hole through an HBA – the system will keep running.
This is a pretty good deal. Going with HP hardware, it will be close to 25K just for the DB servers, and you still need the interconnect network, shared storage and its network and SSD.
Don’t believe benchmarks! But here’s my test for small (10G) OLTP database.
Note that we have two interconnect interfaces, for redundancy. Jumbo frames is configured by default. This is a big deal – jumbo-frames improves performance and is normally a pain to set up for RAC.
Note that we used “ cheats ” to improve application scalability – sequences were created with cache size of 1000 (not usual 20!) and many indexes are reversed to reduce contention. These “ cheats ” can improve scalability – so you should use them too!
GC Wait doesn’t necessarily mean the interconnect is a problem
128Mb/s theoretical saturation
You want the time to write to redo log to be as fast as possible, because a transaction has to wait until redo is written to disk when it commits before it can move on. This is a serial part and can quickly become a bottleneck and impact the performance of an entire instance. From our previous benchmarks, we were already pretty sure that ODA configuration does not pose specific problems in this regard, but we wanted to take a closer look and find the limits of how much redo we can push.
Lets start with something important you need to know about ODA – It is intended to be a RAC cluster. Therefore the storage has to be shared, which means that there can ’ t be any non-shared cache between the storage and the database. SANs have cache to speed up redo processing because its performance is so critical and we don ’ t want anything slowing down commits, but ODA can ’ t do it. This means that excessive IO can impact the latency of the HDDs. Traditional systems place redo on their own array or carefully configure the SAN to reduce redo latency. ODA takes an easy solution – SSD. Of course, datafiles are still written to normal disks which can get congested, so tuning DBWR to avoid excessive IO is still recommended.
, I’ve read Oracle claims that no redo log write will take more than 0.5 ms. According to my ORION benchmarks doing sequential 32K writes (Figure 2), I have achieved around 4,000 writes to SSD disks accounting for ASM high redundancy (i.e., one redo write is written on three disks) with eight to ten parallel threads. This means four to five RAC databases with each instance aggressively pounding redo logs. In this situation, the average write time is still around 0.5 ms. Note that because of the piggyback effect of multiple commits, the effective achievable transaction rate is actually higher On corporate SAN we are often happy with 2-3ms commit times.
Without writes – 20ms latency with 4700 IOPS, with 40% writes its 4500 IOPS
Without writes – 20ms latency with 4700 IOPS, with 40% writes its 4500 IOPS
Depending on the patterns of parallel scans, I was able to get up to 2.4 GBPS using ORION on a single node with 1GB reads. t Oracle specs of ODA claim up to a 4 GB scan rate. We didn ’ t test both nodes, so we don ’ t know what we can realistically reach.
Shapira oda perf_webinar_v2
An Insider’s Guide toODAP e rfo rm a nc ePrepared by: Alex Gorbachev, Pythian CTO & Gwen ShapiraPresented by: Gwen Shapira, Senior Pythian Consultant
Alex Gorbachev Gwen Shapira CTO, Pythian Senior Consultant, PythianPresident, Oracle RAC SIG Oracle Ace Director
W h y C o m p a n ie s Tr u s t P y t h ia n Recognized Leader: • Global industry-leader in remote database administration services and consulting for Oracle, Oracle Applications, MySQL and SQL Server • Work with over 150 multinational companies such as Western Union, Fox Interactive Media, and MDS Inc. to help manage their complex IT deployments Expertise: • One of the world’s largest concentrations of dedicated, full-time DBA expertise. Global Reach & Scalability: • 24/7/365 global remote support for DBA and consulting, systems administration, special projects or emergency response38
Whe re ’ s the In t e r c o n n e c t ? [root@odaorcl1 ~]# /u01/app/184.108.40.206/grid/bin/oifcfg getif eth0 192.168.16.0 global cluster_interconnect eth1 192.168.17.0 global cluster_interconnect bond0 172.20.31.0 global public eth0 Link encap:Ethernet HWaddr 00:21:28:E7:C3:72 inet addr:192.168.16.24 Bcast:192.168.16.255 inet6 addr: fe80::221:28ff:fee7:c372/64 UP BROADCAST RUNNING MULTICAST MTU:9000 17
[root@odaorcl1 ~]# ethtool eth0 Settings for eth0: Supported ports: [ FIBRE ] Supported link modes: 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: FIBRE PHYAD: 0 Transceiver: external Auto-negotiation: on Supports Wake-on: pumbg Wake-on: d Current message level: 0x00000001 (1)18 Link detected: yes
B u t W a it ! Event Waits Time(s) (ms) time Wait Class ------------------------------ ------------ ----------- ------ ------ DB CPU 6,459 29.9 buffer busy waits 123,162 3,725 30 17.3 Concurrenc gc buffer busy release 8,871 3,383 381 15.7 Cluster gc current block 2-way 3,282,774 1,969 1 9.1 Cluster gc buffer busy acquire 11,073 1,364 123 6.3 Cluster21
B u t W a it ! Event Waits Time(s) (ms) time Wait Class ------------------------------ ------------ ----------- ------ ------ enq: US - contention 1,123,271 33,733 30 38.2 Other enq: HW - contention 42,551 17,317 407 19.6 Configurat buffer busy waits 156,152 11,550 74 13.1 Concurrenc latch: row cache objects 798,648 6,181 8 7.0 Concurrenc DB CPU 5,796 6.622
I need that buffer. I’m busy! Waiting 381 ms later: Here’s the buffer!23
In t e r c o n n e c t A g a in Send Receive Used By Mbytes/sec Mbytes/sec ---------------- ----------- ----------- Global Cache 48.94 43.04 Parallel Query .00 .00 DB Locks 4.99 5.23 DB Streams .00 .00 Other .00 .01 In s t a n c e L a te nc y L a te nc y 5 0 0 B MS G 8 K MGS 1 0.14 0.13 2 0.58 0.6924
M o re L G WR P e rfo rm a n c e Saturating LGWR Test • 3200 writes, 2 nodes, 0.2ms latency • LGWR spent 70% of time on CPU SwingBench Order Entry • 4500 TPS • Bottleneck was buffer busy contention Big data load • 100K size write, several ms latency • Data warehouse load – bad fit for ODA31
Capacity Planning for Mig r ation or Consolidation43
C h o o s in g C o n s o lid a t io n C a n d id a t e s • Vendor limitations • SLAs • Dependencies • CPU utilization • Workload type Big Question: Will it fit?44
C o lle c t m e t r ic s • CPU utilization • Memory usage – SGA + PGA • Storage requirements • Workload types • I/O requirements – IOPS, throughput • RAC – current interconnect load45
C PU Build time-based model of utilization on existing servers: Time S1 (8 S2 (4 S3 (32 Total core) core) core) 00:00 50% 25% 10% 8*0.5+4*0.25+32*0.1 = 8.2 00:15 30% 50% 10% 7.6 00:30 100% 25% 10% 12.2 We calculated 12.2 cores in use at peak time. ODA’s 24 cores give plenty of spare capacity You can get more accurate results by taking core speed into account. This is a rough model.46
Me mory • Easiest way: Sum memory on existing servers • Actually: Sum SGA and PGA sizes, and leave 20-30% spare Use advisors: • OEMgives graphs with SGA and PGA size recommendations.47
IO C a p a c it y • OLTP and DWH go in separate boxes • Each can be standby of the other • Consider throughput and latency requirements • According to our tests: • 12K redo IOPS at 0.5 ms latency • Over 3000 data file IOPS at 15ms latency • Almost 6000 if using outside only • Can reach 2.4GBPS48
D is k S p a c e • High redundancy – triple data usage • Can use external storage if needed • ZFS supports HCC • Take backups into account49
Te s t i n g • Always test • Bad tests are still better than no tests • Replicating production load: • RAT • “Brewing Benchmarks” • Jmeter, Loadrunner, etc • Especially test: • Migration strategy and times • Non-RAC applications going to RAC • Upgrades50