Shapira oda perf_webinar_v2


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Lets get started! My name is Gwen Shapira, I ’ m a senior consultant for Pythian. We are here to discuss the performance of the Oracle Database Appliance. I get two type of performance questions from companies considering ODA: I need to scale my application. Is ODA the answer? I ’ m planning to move to ODA for other reasons, how do I know I ’ ll still get the performance I need. This presentation will address these two questions and give you an idea of which applications and workloads are a good fit for ODA, and what kind of performance you can expect.
  • Alex Gorbachev, Pythian ’ s CTO and President of the RAC special interest group. He ran many of the tests and benchmarks that we ’ ll show in this presentation. I ’ m a senior consultant for Pythian with many years of RAC experience. I ran other benchmarks and will be presenting the results here. We are both Oracle ACE Directors and members of the Oak Table Network.
  • - Successful growing business for more than 10 years - Served many customers with complex requirements/infrastructure just like yours. - Operate globally for 24 x 7 “always awake” services
  • Enough about us – lets talk about ODA Simple and RAC did not use to appear in the same sentence. RAC is a complex system with many components and dependencies on storage and network. Setting up RAC system requires a lot of coordination between network admins, storage admins, sysadmins and DBAs. Its considered a large project and can take a long time (weeks) to get going right. ODA is intended to be plug-and-play solution. Get going relatively quickly (hours instead of days or weeks), with a pre-configured system it is more difficult to get things wrong. Doesn ’ t have to be RAC! One customer asked us if he can have a dataguard standby with primary on one node and standby on the other – not recommended, but definitely a possibility!
  • Interconnect and storage have big impact on performance
  • You see 24 disks here and various indicator lights. The upper row has the 4 SSD disks. If you need to replace a disk – this is where you do it.
  • On the left: power supply. 4 network port in two bonded interfaces for backups, DR. Two large ports below are 10gE public database interface. On the right panel: leftmost is the serial connector to console, then 2x1gE for public network, ethernet ILOM connection and USB+Video connectors. What you don ’ t see is the interconnect. There are two on-board integrated interconnect interfaces. Not bonded but used for redundancy.
  • This is the part that plugs into the back plane, with the interconnects and power supply.
  • When we do forklift migrations to Exadata (i.e. with no application changes), we are always impressed by the performance improvements. 10x improvement is not rare, its expected. Some of it is due to Exadata secret sauce (Mostly not included in ODA) But some is due to modern, well thought out hardware architecture that is pre-configured And some improvements are due to 11gR2 optimizations. With ODA you get two of the big Exadata benefits. Westmere cores, fastest you can get at 95W (easier to cool)
  • Each node has two HBAs, each connected to two expanders. Each expander is connected to 12 disks. Each disk has two ports so it has connectivity to both nodes. There are two paths from the node to each disk – through both HBAs, but you don ’ t need to configure multipathing or even know what it is. It is pre-configured. Another nice thing is the high availability – any component can (and will) fail without impacting the system availability. Pull out a disk, a cable, an entire node, shoot a hole through an HBA – the system will keep running.
  • This is a pretty good deal. Going with HP hardware, it will be close to 25K just for the DB servers, and you still need the interconnect network, shared storage and its network and SSD.
  • Don’t believe benchmarks! But here’s my test for small (10G) OLTP database.
  • Note that we have two interconnect interfaces, for redundancy. Jumbo frames is configured by default. This is a big deal – jumbo-frames improves performance and is normally a pain to set up for RAC.
  • Note that we used “ cheats ” to improve application scalability – sequences were created with cache size of 1000 (not usual 20!) and many indexes are reversed to reduce contention. These “ cheats ” can improve scalability – so you should use them too!
  • GC Wait doesn’t necessarily mean the interconnect is a problem
  • 128Mb/s theoretical saturation
  • You want the time to write to redo log to be as fast as possible, because a transaction has to wait until redo is written to disk when it commits before it can move on. This is a serial part and can quickly become a bottleneck and impact the performance of an entire instance. From our previous benchmarks, we were already pretty sure that ODA configuration does not pose specific problems in this regard, but we wanted to take a closer look and find the limits of how much redo we can push.
  • Lets start with something important you need to know about ODA – It is intended to be a RAC cluster. Therefore the storage has to be shared, which means that there can ’ t be any non-shared cache between the storage and the database. SANs have cache to speed up redo processing because its performance is so critical and we don ’ t want anything slowing down commits, but ODA can ’ t do it. This means that excessive IO can impact the latency of the HDDs. Traditional systems place redo on their own array or carefully configure the SAN to reduce redo latency. ODA takes an easy solution – SSD. Of course, datafiles are still written to normal disks which can get congested, so tuning DBWR to avoid excessive IO is still recommended.
  • , I’ve read Oracle claims that no redo log write will take more than 0.5 ms. According to my ORION benchmarks doing sequential 32K writes (Figure 2), I have achieved around 4,000 writes to SSD disks accounting for ASM high redundancy (i.e., one redo write is written on three disks) with eight to ten parallel threads. This means four to five RAC databases with each instance aggressively pounding redo logs. In this situation, the average write time is still around 0.5 ms. Note that because of the piggyback effect of multiple commits, the effective achievable transaction rate is actually higher On corporate SAN we are often happy with 2-3ms commit times.
  • Without writes – 20ms latency with 4700 IOPS, with 40% writes its 4500 IOPS
  • Without writes – 20ms latency with 4700 IOPS, with 40% writes its 4500 IOPS
  • Depending on the patterns of parallel scans, I was able to get up to 2.4 GBPS using ORION on a single node with 1GB reads. t Oracle specs of ODA claim up to a 4 GB scan rate. We didn ’ t test both nodes, so we don ’ t know what we can realistically reach.
  • Shapira oda perf_webinar_v2

    1. 1. An Insider’s Guide toODAP e rfo rm a nc ePrepared by: Alex Gorbachev, Pythian CTO & Gwen ShapiraPresented by: Gwen Shapira, Senior Pythian Consultant
    2. 2. Alex Gorbachev Gwen Shapira CTO, Pythian Senior Consultant, PythianPresident, Oracle RAC SIG Oracle Ace Director
    3. 3. W h y C o m p a n ie s Tr u s t P y t h ia n Recognized Leader: • Global industry-leader in remote database administration services and consulting for Oracle, Oracle Applications, MySQL and SQL Server • Work with over 150 multinational companies such as Western Union, Fox Interactive Media, and MDS Inc. to help manage their complex IT deployments Expertise: • One of the world’s largest concentrations of dedicated, full-time DBA expertise. Global Reach & Scalability: • 24/7/365 global remote support for DBA and consulting, systems administration, special projects or emergency response38
    4. 4. 4 © 2012 Pythian
    5. 5. O r a c le D a t a b a s e A p p lia n c e • Simple RAC-In-A-Box •2 database servers + shared storage + interconnect • Inexpensive5 © 2012 Pythian
    6. 6. W e w ill t a lk a b o u t : • Node Hardware • Interconnect • Storage • Benchmark results • Capacity planning tips6
    7. 7. W hat’s in a Ser ver Node?7
    8. 8. O D A F r o n t V ie w8 © 2012 Pythian
    9. 9. O D A R e a r V ie w9 © 2012 Pythian
    10. 10. S y s t e m C o n t r o lle r V ie w10 © 2012 Pythian
    11. 11. S y s t e m C o n t r o lle r V ie w11 © 2012 Pythian
    12. 12. S e r v e r N o d e ( S N ) / S ys te m C o n t r o lle r ( S C ) • Two X5675 - 3.06GHz, 6 core • 96G RAM • Two SATA 7500 RPM, 500G disks • Lots of network ports, both 1GbE and 10GbE •Id e n t ic a l t o X 2 -2 E x a d a t a n o d e12 © 2012 Pythian
    13. 13. O r a c le D a t a b a s e A p p lia n c e S to ra g e • 20 SAS 15000 RPM 600GB •4 SAS SSD 73GB • Each SN – 2 HBA • Each SN – 2 Expanders • Each Expander – 12 disks • Each disk – 2 SAS ports13
    14. 14. Only $50K14 © 2012 Pythian
    15. 15. S o u n d o f a S in g le N o d e S c a lin g15
    16. 16. Cluster Inter connect16
    17. 17. Whe re ’ s the In t e r c o n n e c t ? [root@odaorcl1 ~]# /u01/app/ getif eth0  global  cluster_interconnect eth1  global  cluster_interconnect bond0  global  public eth0      Link encap:Ethernet  HWaddr 00:21:28:E7:C3:72            inet addr:  Bcast:            inet6 addr: fe80::221:28ff:fee7:c372/64           UP BROADCAST RUNNING MULTICAST  MTU:9000     17
    18. 18. [root@odaorcl1 ~]# ethtool eth0 Settings for eth0:      Supported ports: [ FIBRE ]      Supported link modes:   1000baseT/Full      Supports auto-negotiation: Yes      Advertised link modes:  1000baseT/Full      Advertised auto-negotiation: Yes      Speed: 1000Mb/s      Duplex: Full      Port: FIBRE      PHYAD: 0      Transceiver: external      Auto-negotiation: on      Supports Wake-on: pumbg      Wake-on: d      Current message level: 0x00000001 (1)18      Link detected: yes
    19. 19. In t e r c o n n e c t P e rfo rm a nc e I s 1G b E a p r o b l e m ? •Dedicated 2 x 1 GbE Fibre links •No switches •IC latency ~ 0.5 ms. •Like Exadata over IB •Only 2 nodes •Workload matters19 © 2012 Pythian
    20. 20. Th ro u g h p u t – 400 VU s e rs20
    21. 21. B u t W a it ! Event Waits Time(s) (ms) time Wait Class ------------------------------ ------------ ----------- ------ ------ DB CPU 6,459 29.9 buffer busy waits 123,162 3,725 30 17.3 Concurrenc gc buffer busy release 8,871 3,383 381 15.7 Cluster gc current block 2-way 3,282,774 1,969 1 9.1 Cluster gc buffer busy acquire 11,073 1,364 123 6.3 Cluster21
    22. 22. B u t W a it ! Event Waits Time(s) (ms) time Wait Class ------------------------------ ------------ ----------- ------ ------ enq: US - contention 1,123,271 33,733 30 38.2 Other enq: HW - contention 42,551 17,317 407 19.6 Configurat buffer busy waits 156,152 11,550 74 13.1 Concurrenc latch: row cache objects 798,648 6,181 8 7.0 Concurrenc DB CPU 5,796 6.622
    23. 23. I need that buffer. I’m busy! Waiting 381 ms later: Here’s the buffer!23
    24. 24. In t e r c o n n e c t A g a in Send Receive Used By Mbytes/sec Mbytes/sec ---------------- ----------- ----------- Global Cache 48.94 43.04 Parallel Query .00 .00 DB Locks 4.99 5.23 DB Streams .00 .00 Other .00 .01 In s t a n c e L a te nc y L a te nc y 5 0 0 B MS G 8 K MGS 1 0.14 0.13 2 0.58 0.6924
    25. 25. Storage Performance - REDO LOG25
    26. 26. N o S to ra g e C a c he Implications: •Excessive IO will impact latency •Online redo logs are on SSD •Tune DBWR processes (MTTR target)26 © 2012 Pythian
    27. 27. S S D • 4x 73GB • D e d ic a t e d t o r e d o lo g s • Reminder: • 0.025ms read • 0.250ms write (best case) • Writes are not just writes • Over-provisioning27
    28. 28. 28 © 2012 Pythian
    29. 29. S S D fo r R e d o • Not a general recommendation • Consistent low latency • Works well for multiple databases • Leftover space29
    30. 30. O D A : S S D P e rfo rm a nc e fo r L G WR30
    31. 31. M o re L G WR P e rfo rm a n c e Saturating LGWR Test • 3200 writes, 2 nodes, 0.2ms latency • LGWR spent 70% of time on CPU SwingBench Order Entry • 4500 TPS • Bottleneck was buffer busy contention Big data load • 100K size write, several ms latency • Data warehouse load – bad fit for ODA31
    32. 32. Storage Performance - DATA32
    33. 33. H D D P e rfo rm a nc e We tested: • HDD Scalability • Effects of disk placement • Backups!33
    34. 34. O D A S m a ll R a n d o m R e a d s - H D D s S c a la b ilit y34 © 2012 Pythian
    35. 35. O D A W r it e IO im p a c t - M in im a l35 © 2012 Pythian
    36. 36. O D A W r it e IO im p a c t - M in im a l36
    37. 37. O D A S m a ll R a n d o m R e a d s : D a t a P la c e m e n t37 © 2012 Pythian
    38. 38. Co-locating data onto o u t e r 4 0 % of a disk adds 5 0 % m o r e IO P S38
    39. 39. O D A S e q u e n t ia l R e a d s S c a la b ilit y ( S in g le n o d e ) I c o u ld r e a c h 2 . 4 G B P S w it h 2 4 p a r a lle l r e a d s f o r a s in g le s t r e a m39 © 2012 Pythian
    40. 40. R M A N B a c k u p P e r f o r m a n c e ( 1) Backup to FRA: • Optimal number of channels - 8 • 42 GB of data in 1 min 45 seconds = 400 MBPS • 1.6 TB full backup in about 1 hour40 © 2012 Pythian
    41. 41. R M A N B a c k u p P e rfo rm a nc e ( 2 ) Backup to external location: • BACKUP VALIDATE with 8 channels • 42 GB of data in 45 seconds = 1 GBPS • Theoretical maximum wire speed for one link 10 GbE • 4 TB database in 1 hour 15 minutes41 © 2012 Pythian
    42. 42. C o n f ig u r a t io n s o f n o t e :42 © 2012 Pythian
    43. 43. Capacity Planning for Mig r ation or Consolidation43
    44. 44. C h o o s in g C o n s o lid a t io n C a n d id a t e s • Vendor limitations • SLAs • Dependencies • CPU utilization • Workload type Big Question: Will it fit?44
    45. 45. C o lle c t m e t r ic s • CPU utilization • Memory usage – SGA + PGA • Storage requirements • Workload types • I/O requirements – IOPS, throughput • RAC – current interconnect load45
    46. 46. C PU Build time-based model of utilization on existing servers: Time S1 (8 S2 (4 S3 (32 Total core) core) core) 00:00 50% 25% 10% 8*0.5+4*0.25+32*0.1 = 8.2 00:15 30% 50% 10% 7.6 00:30 100% 25% 10% 12.2 We calculated 12.2 cores in use at peak time. ODA’s 24 cores give plenty of spare capacity You can get more accurate results by taking core speed into account. This is a rough model.46
    47. 47. Me mory • Easiest way: Sum memory on existing servers • Actually: Sum SGA and PGA sizes, and leave 20-30% spare Use advisors: • OEMgives graphs with SGA and PGA size recommendations.47
    48. 48. IO C a p a c it y • OLTP and DWH go in separate boxes • Each can be standby of the other • Consider throughput and latency requirements • According to our tests: • 12K redo IOPS at 0.5 ms latency • Over 3000 data file IOPS at 15ms latency • Almost 6000 if using outside only • Can reach 2.4GBPS48
    49. 49. D is k S p a c e • High redundancy – triple data usage • Can use external storage if needed • ZFS supports HCC • Take backups into account49
    50. 50. Te s t i n g • Always test • Bad tests are still better than no tests • Replicating production load: • RAT • “Brewing Benchmarks” • Jmeter, Loadrunner, etc • Especially test: • Migration strategy and times • Non-RAC applications going to RAC • Upgrades50
    51. 51. O r a c le D a t a b a s e A p p lia n c e R e q u i r e s 11. 2 . 0 . 2 We will upgrade and migrate your DB to ODA for free51 © 2012 Pythian
    52. 52. Th a n k yo u a n d Q & A To c o n ta c t u s … Gwen Shapira – Alex Gorbachev – 1-877-PYTHIAN T o f o llo w u s … @pythian @pythianjobs © 2012 Pythian