Introduction…
• Karl Arao, OCP‐DBA, RHCT
• Senior Consultant at SQL*Wizard
• RAC user for 3years
• 1st environment on VMware
• I “heart” performance
• Don’t like to guess when troubleshooting
Some of 10g Performance Features
• OEM Performance Page
• ADDM
• SQL Tuning advisor
• AWR (DBA_HIST_)
• ASH
• Time Model (total time for all db calls)
• Wait Class (12 wait class)
• Metrics (v$ performance metric deltas)
• Services
Setup
• Server and Storage: SunFire X4200 (2CPU,
12GB memory) with LUNs on EMC CX300
• OS: RHEL 4.3 ES
• Database and clusterware: Oracle 10.2.0.3
• Database Files, Flash Recovery Area, OCR, and
Voting disk are located on OCFS2 filesystems
• Application: Forms and Reports (6i and also
lower)
2. Checked the DB environment
This could be because of:
1) The clients having lower versions (< Sql*Plus 8.1
or OCI8, see Note 97926.1) that may not support
TAF (FAILOVER_MODE) and Load Balancing
(LOAD_BALANCE)
OR
2) They are using TNS entries explicitly connecting
to server1
2. Checked the DB environment
• How bout I graph it in excel? Will the data be more
meaningful?
.. YES most of the users uses the xxxlogin.fmx module
3. Checked instance‐wide
DB performance
• Graphed the ASH data..
.. suffering from “gc cr block lost” and “gc cr multi block request” from 7am to 4pm
3. Checked instance‐wide
DB performance
• Researched on Metalink for known issues..
Found Doc ID: 563566.1 gc lost blocks
diagnostics
• Was able to pinpoint the peak period from the
graph. Then, generated ADDM and AWR
report on that peak period..
3. Checked instance‐wide
DB performance
• ADDM
Elapsed Time: 60min
DB Time: 61.83min
AAS: 1.03
Max CPU: 2
3. Checked instance‐wide
DB performance
• Should I follow these recommendations right away?
Nope collect more facts, numbers, figures
3. Checked instance‐wide
DB performance
• Do we have a workload distribution problem?
Nope even with distributed users..
We still have performance problem..
4. Checked session‐level
DB performance
• The database has too many activity, where do
I start? Where to drill down?
• gv$session_longops & gv$session_wait output
too many users, and require repetitive
monitoring
• In the spirit of Method‐R…
"WORK FIRST TO REDUCE THE BIGGEST RESPONSE TIME COMPONENT OF A
BUSINESS' MOST IMPORTANT USER ACTION“
• Went to the Accounting Department, checked
on the desktop terminals
4. Checked session‐level
DB performance
• Users PC1069 (with SID 601) and PC918 (with
SID 483) are on total hang
4. Checked session‐level
DB performance
• Checked on the
– performance/wait counters
– the current SQLs
5. Drilled down on the network
interconnect
• Generated a “cat & egrep” command to look
for problems in the interconnect from the OS
Watcher “netstat” output
(from Metalink Doc ID: 563566.1 gc lost blocks diagnostics)
5. Drilled down on the network
interconnect
$ cat server1_netstat.dat | egrep ‐i "udpInOverflows|packet receive
errors|fragments dropped|reassembles failed|fragments dropped after
timeout"
34096 fragments dropped after timeout
306030 packet reassembles failed
15 packet receive errors
34096 fragments dropped after timeout
306268 packet reassembles failed
15 packet receive errors
34096 fragments dropped after timeout
306574 packet reassembles failed
… output snipped …
Conclusion
You don’t have to guess..
Even if it’s a RAC environment..
It just takes facts, numbers, figures
to solve a performance problem
References and Tools
• http://karlarao.wordpress.com
• http://blog.tanelpoder.com
– http://www.tanelpoder.com/files/TPT_public.zip
– http://www.tanelpoder.com/files/PerfSheet.zip
– Neil Gunther & Tanel Poder ‐ Multidimensional Visualization of Oracle
Performance using Barry007 http://arxiv.org/pdf/0809.2532
• http://ashmasters.com
• http://www.perfvision.com
• http://www.method‐r.com
• Metalink Doc ID 97926.1 Failover Issues and Limitations [Connect‐time
failover and TAF]
• Metalink Doc ID 563566.1 gc lost blocks diagnostics
• Metalink Doc ID 301137.1 OS Watcher User Guide