Rac performance tuning iloug
Upcoming SlideShare
Loading in...5
×
 

Rac performance tuning iloug

on

  • 2,506 views

 

Statistics

Views

Total Views
2,506
Views on SlideShare
2,302
Embed Views
204

Actions

Likes
1
Downloads
144
Comments
0

2 Embeds 204

http://theoracles.co.il 191
http://www.theoracles.co.il 13

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Rac performance tuning iloug Rac performance tuning iloug Presentation Transcript

  • Oracle RACDavid YahalomCTONaya Technologieswww.naya-tech.co.ildavidy@naya-tech.co.il
  • Oracle RAC Architecture Firewall Node1 Node2 Interconnect Node3 Node4 Node3 Shared Storage
  • Oracle RAC Architecture public network /…/ VIP1 VIP2 VIPn Service Service Node 2 ServiceNode1 Node n Listener Listener Listener instance 1 instance 2 instance n ASM ASM ASM Oracle Clusterware Oracle Clusterware Oracle Clusterware Operating System Operating System Operating System shared storage Redo / Archive logs all instances Database / Control files OCR and Voting Disks Managed by ASM
  • Let’s get some terminologyout of the way…
  • Let’s get some terminology out of the way…Global Cache Service (GCS)
  • Global Cache Service (GCS)•  Manages coherent access to data in buffer caches of all instances in the cluster.•  Minimizes access time to data which is not in local cache •  access to data in global cache faster than disk access•  Implements fast direct memory access over high- speed interconnects •  for all data blocks and types (current – wirte, CR - read).•  Uses an efficient and scalable messaging protocol •  Never more than 3 hops•  Optimizations for read-mostly applications
  • Cache Hierarchy: Data in Remote Cache Local Cache Miss LMS Datablock Requested Datablock Returned LMSRemote Cache Hit
  • Let’s get some terminology out of the way…Oracle block “master”.
  • Oracle RAC block master:•  The master can be thought of as the directorynode for a block or an object.•  The global state of the data block – whether it iscached or on disk, which instances have theblocks cached and whether the blocks can beshared immediately or has modification pending -is completely known at the master.
  • Cache Hierarchy: Data On Disk Local Cache Miss LMS Datablock Requested Disk Read Grant Returned LMSRemote Cache Miss
  • GC Current block 2-way
  • GC Current block 3-way
  • <Insert Picture Here>What can go wrong?
  • Common Problems and Symptoms•  “Lost Blocks”: Interconnect or Switch Problems. <Insert Picture Here>•  Slow or bottlenecked disks: one node becomes a bottleneck, entire cluster waits.•  System load and scheduling: high CPU – “frozen” LMS processes.•  Contention: frequent access to same resources.•  Unexpectedly high latencies: network issues.
  • Best practice #1: Tune the interconnect Often overlooked, but always important.• Dropped packets/fragments• Buffer overflows / high load on NICs.• Packet reassembly failures or timeouts• TX/RX errors• Verify low utilization
  • “Lost Blocks”: NIC Receive Errorsifconfig –a:eth0 Link encap:Ethernet HWaddr 00:0B:DB:4B:A2:04 inet addr:130.35.25.110 Bcast:130.35.27.255 Mask:255.255.252.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:21721236 errors:135 dropped:0 overruns:0 frame:95 TX packets:273120 errors:0 dropped:0 overruns:0 carrier:0 … Overruns indicates that NIC internal buffers should be increased while dropped may indicate that the driver and OS layers cannot drain the queued messages fast enough.
  • Finding a Problem with the InterconnectTop 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time(s)(ms) Time WaitClass----------------------------------------------------------------------------------------------------log file sync 286,038 49,872 174 41.7 Commitgc buffer busy 177,315 29,021 164 24.3 Clustergc cr block busy 110,348 5,703 52 4.8 Clustergc cr block lost 4,272 4,953 1159 4.1 Clustercr request retry 6,316 4,668 739 3.9 Other Should never be here
  • Interconnect Statistics Automatic Workload Repository (AWR )Target Avg Latency Stddev Avg Latency StddevInstance 500B msg 500B msg 8K msg 8K msg--------------------------------------------------------------------- 1 .79 .65 1.04 1.06 2 .75 .57 . 95 .78 3 .55 .59 .53 .59 4 1.59 3.16 1.46 1.82--------------------------------------------------------------------- Latency probes for different message sizes Exact throughput measurements (not shown) Send and receive errors, dropped packets (not shown )
  • Interconnect latencyEvent Waits Time (s) AVG (ms) % CallTime---------------------- ---------- ---------- --------- --------gc cr block 2-way 317,062 5,767 18 19.0gc current block 2-way 201,663 4,063 20 13.4gc buffer busy 111,372 3,970 36 13.1CPU time 2,938 9.7gc cr block busy 40,688 1,670 41 5.5------------------------------------------------------- Expected: To see 2-way, 3-way Unexpected: To see > 1 ms (AVG ms should be around 1 ms) Cause: high load, slow interconnect, contention… Tackle latency first, then tackle busy events
  • Cache Fusion messaging trafficGlobal Cache Load Profile~~~~~~~~~~~~~~~~~~~~ Per Second Per Transaction ---------------- --------------------- Global Cache blocks received: 4.30 3.65 Global Cache blocks served: 23.44 19.90 GCS/GES messages received: 133.03 112.96 GCS/GES messages sent: 78.61 66.75 DBWR Fusion writes: 0.11 0.10 Est Interconnect traffic (KB) 263.20Network traffic received = Global Cache blocks received * DB block size =4.3 * 8192 = .01 Mb/secNetwork traffic generated = Global Cache blocks served * DB block size =23.44 * 8192 = .20 Mb/sec
  • What to do?• Dedicated interconnect NICs and switches.• Tune IPC buffer sizes• Ensure enough OS resources available •  Spinning process can consume all network ports• Disable any firewall on interconnect• Use “Jumbo Frames” where supported.• Make sure network utilization is low (20%).
  • Best practice #2: I/O is critical to RACStorage is global to the cluster and asingle badly behaving node or badlybalanced disk configuration can affectthe entire disk read and writeperformance of all nodes.
  • •  Log flush IO delays can cause “busy” buffers: LGWR always writes before block changes ownership. LGWR bad latency – bad overall RAC performance.•  “Bad” queries on one node can saturate a disk where the redo logs are located.•  IO is issued from ALL nodes to shared storage.
  • Cluster-Wide I/O Impact Node 1Top 5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time(s)(ms) Time------------------------------ ------------ ----------- ------ ------ Expensive Query in Node 2log file sync 286,038 49,872 174 41.7 Impacts Node1gc buffer busy 177,315 29,021 164 24.3gc cr block busy 110,348 5,703 52 4.8 Node 2 Load Profile 1. IO on disk group containing ~~~~~~~~~~~~ Per Second redo logs is bottlenecked. --------------- 2. Block shipping for hot blocks Redo size: 40,982.21 Logical reads: 81,652.41 is delayed by log flush IO. Physical reads: 51,193.37 3. Serialization/Queues build up.
  • Drill-down on node 2: An IO capacity problemTop 5 Timed Events Avg %Total wait CallEvent Waits Time(s) (ms) Time Wait Class---------------- -------- ------- ---- ---- ----------db file scattered read 3,747,683 368,301 98 33.3 User I/Ogc buffer busy 3,376,228 233,632 69 21.1 Clusterdb file parallel read 1,552,284 225,218 145 20.4 User I/Ogc cr multi block 35,588,800 101,888 3 9.2 Clusterrequestread by other session 1,263,599 82,915 66 7.5 User I/O I/O contention
  • After “killing” the session… Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time (s) (ms) Time Wait Class --------------------------- --------- ----------- ---- ------ ---------- CPU time 4,580 65.4 log file sync 276,281 1,501 5 21.4 Commit log file parallel write 298,045 923 3 13.2 System I/O gc current block 3-way 605,628 631 1 9.0 Cluster gc cr block 3-way 514,218 533 1 7.6 Cluster1. Log file writes are normal2. Global serialization has disappeared
  • What to do?•  Tune IO layout – RAC much more sensitive to full table scans / full index scans / etc…•  Tune queries consuming a lot of IO.•  One busy node can affect the entire cluster.•  Separate storage of redo log files and data files.•  Make sure Async I/O is enabled!
  • Best practice #3: single node CPU load matters If an LMS is not able to be scheduled in order to process messages which have arrived in its request queue, the time in the run queue adds to the data access time for users on other nodes. Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time(s) (ms) Time Wait Class ----------------- --------- ------ ---- ----- ---------- gc current block 275,004 21,054 77 21.3 Cluster congested gc cr grant congested 177,044 13,495 76 13.6 Cluster gc cr block congested 85,975 8,917 104 9.0 ClusterCongested : LMS could not dequeue messages fast enough.Cause : Long run queue, CPU starvation.Solution : High process priority for LMS, start more LMS processes. Never use more LMS processes than CPUs
  • Best practice #4: avoid block contentionEvent Waits Time (s) AVG (ms) % CallTime---------------------- --------- -------- -------- --------gc cr block 2-way 317,062 5,767 18 19.0gc current block 2-way 201,663 4,063 20 13.4gc buffer busy 111,372 3,970 36 13.1CPU time 2,938 9.7gc cr block busy 40,688 1,670 41 5.5-------------------------------------------------------•  Any frequently accessed data may have hotspots which are sensitive tohow may users are accessing the same data concurrently.•  Its is very likely that CR BLOCK BUSY and GC BUFFER BUSY are related.•  RAC can magnify a resource bottleneck.•  Identify “hot” blocks and reduce concurrency.•  If possible – “partition” application workload.
  • Best practice #5: Smart application design•  No fundamentally different design and coding practices for RAC.BUT:•  Flaws in execution or design have higher impact in RAC •  Performance and scalability in RAC will be more sensitive to bad plans or bad schema design •  Serializing contention makes applications less scalable.•  Standard SQL and schema tuning solves > 80% of performance problems
  • Major scalability pitfalls•  Serializing contention on a small set of data/index blocks •  monotonically increasing index (sequence numbers) – not scalable when index modified from all nodes. •  frequent updates of small cached tables (“hot blocks”). •  Sparse blocks ( PCTFREE 99 ) will reduce serialization •  Concurrent DDL and DML (frequent invalidation of cursors = many data dictionary reads and syncs). •  Segment without automatic segment space management (ASSM) or Free List Group (FLG). •  Sequence caching
  • Major scalability pitfalls•  Serializing contention on a small set of data/index blocks •  monotonically increasing index (sequence numbers) – not scalable when index modified from all nodes. •  frequent updates of small cached tables (“hot blocks”). •  Sparse blocks ( PCTFREE 99 ) will reduce serialization •  Concurrent DDL and DML (frequent invalidation of cursors = many data dictionary reads and syncs). •  Segment without automatic segment space management (ASSM) or Free List Group (FLG). •  Full table scans - direct reads do not need to be globally synchronized ( hence less CPU for global cache )