Whitepaper: Mining the AWR repository for Capacity Planning and Visualization


Published on

Published in: Technology

Whitepaper: Mining the AWR repository for Capacity Planning and Visualization

  1. 1. MPARI1trAoDcaSining thRT 1 - WheI’m certain th108 AWR repthe bottleneckreports by hanAWR reportsof the 1000+ lDefinitely thiconsultant I malready availaSQL*Plus sese AWRNew CPUa cost. HavailableDBAs anguesswoproper pfor yoursavingsAWR issteroidsworkloasamplesto visuaAWR daand UtilmetricsIn this pto haveAnalysisre it all stahat DBAs or dports in just ank? And whatnd, and it is ageneration, elines of perfos will lead tomust also be awable to helpssion?RepositoUs and storage aHence, capacity pe and to handlend IT managersork youll end upplanning, and mar workload withfor the companya built-in data s". It has improd information ws, we could buildalize data and usta samples is weization in termsfor Capacity Planpaper you will leae a clear-cut mes, and Performanarteddevelopers don hour. How ais the bottlena daunting anespecially whrmance data to longer analyware on howwith the troutory for Cother ReKOracle ACEkarlaraarrays are gettingplanning plays aexpected and unis justifying thep getting the moanagement of grallowance for ay and a happier Istore that starteoved significantlwhen going throamazing reportsse statistical mete are able to defof CPU, IO, menning.arn how to makeeasurement onnce Firefighting.on’t have enoabout 108 AWneck? Well bend repetitive ehen you start rto correlate itysis periods hto optimize mubleshootingCapacityal WorldKarl AraoE, OCP-DBAao@gmail.cog faster, but thevery importantnexpected workloexpense of addst expensive harrowth, youll beparticular growIT shop.d in 10gR1 andy in 11gR2, enugh all the AWs that will let usthods for analysfine the databasemory, and netwe use of the AWRresources to aidough time to sWR reports inefore it will taexecution of areading eacht to the problehence longermy troubleshobut what ify Plannd StuffA, RHCEomse resources arerole to ensure proads. Another criding resources onrdware. With proable to get justwth period. Thisis very much liknabling you toR snapshots. Frnotice trends anis. Even more suse servers Capacwork, which are vR, specifically thed in Capacity Pspare to readn 5 minutes juake so muchawrrpt.sql. Yof them andem at hand.r time for a pooting time. Yyou are onlying, Vise finite and comeroper resources aitical matter for tn the system. Woper measuremethe right hardwawill result in huke a "Statspackhave a far betrom the AWR dd makes it possiurprising about tcity, Requiremenvery important ke DBA_HIST viewPlanning, Predictd 108 AWR reust to answerof my time juYou will be ovyou only neeproblem to beYou can arguy left with juualizatioe ataretheWithent,areugeontteratablethents,keyws,tiveeports in a dathe question wust to generatverwhelmed bed to see partie solved andue that there arust a commaon, anday, even morwhat period ite these AWRby the manuaicular sectionas a databasre visual tooland line or anesRalsesn
  2. 2. TpvPARAsOTap3tFdsdTfT3AtThis scenarioperformance dvisualize the dRT 2 - HowAWR is muchsources of theOracle versionThe AWR repan AWR repoperformance p339) within ththe workloadFor the querydata blocks resince instancedelta and tranTo transformformula. See tIO MB/s = ( (d                = ((5                = 73To validate th339. The imagAlso a run ofthe throughputriggered medata in moredata, or even pw to mine thh like “Statspae AWR reporn 11.2.port providesort for SNAPproblems wehe specified inchange that’sy output we aead from diske start. We arnsforming it tothe delta tothe example fdelta * <block_size5663126 * 8192) /3.37 MB/s he accuracy oge below showAutomatic Dut of 74 MB/se to mine on tmeaningful mpossible to dohe AWRack on steroidrt are the DBa single summP_ID 335 toare more intenterval. In ths happening.are investigatik. It is also imre particularlyo a more meaa more meanfor SNAP_IDe>) /1024/1024 ) /1024/1024) / 603 f the derivedws the delta wDatabase Diagthat is reallythe source tabmanner that wo some statistds” it is a wonBA_HIST viewmary report b339 that is aerested to seeat way we haing for the Smportant to noy interested oaningful and rningful outpuD 338 below:/ <snap_duration_value we neewe used to dergnostic Monitclose to our dbles of the AWwill be easiertics out of it.nderful data cws which havbased upon anan interval timwhat occurreave a granularYSSTAT staote this is a cuon the delta oeadable outpuut that we cou_in_seconds> ed to comparrive the MB/stor (ADDM)derived valueWR report tofor me to nocollector for Oave grown fron interval of tme from 6:20ed during eacr view of whaatistic “physicumulative phyof each SNAPut.uld easily unre it with thes is correct.on SNAP_IDecut out the uotice trends anOracle and OSom 67 in Oratime. On the i0 – 7:01AM.ch of the sampat’s going oncal reads” whysical reads bP_ID that isnderstand weactual AWRD 338 – 339 sunnecessary annd even possS statistics. Tacle version 1image belowHowever whple (335,336,n and have a bhich is the toby all the dataend_value –would applyreport on SNshows that wend present thible for me toThe underlying10.1 to 108 inwe can creathen analyzing, 337, 338 andbetter view ontal number oabase sessionstart_value =y the IO MB/NAP_ID 338 –e are reachingeognegdnofs= s–g
  3. 3. AT And checkingThe data showSELECT * FRO( SELECT s0.sn  TO_CHAR(s0  s10t0.stat_n  s10t0.value   s10t1.value   (s10t1.value  round(((((s1                                                                            ),2) as phyrFROM dba_h           dba_hi           dba_hi           dba_hiWHERE s0.dbAND s1.dbid  AND s10t0.dbAND s10t1.dbAND s0.instanAND s1.instanAND s10t0.inAND s10t1.inAND s1.snap_AND s10t0.snAND s10t1.snAND s10t0.stAND s10t1.st) WHERE snap_ORDER BY sng it with the Ewn above comOM nap_id snap_id, 0.END_INTERVAL_name, start_value,  end_value, e ‐ s10t0.value) de0t1.value ‐ s10t0.v          + EXTRACT(H          + EXTRACT(M          + EXTRACT(Sreads_mbps ist_snapshot s0, st_snapshot s1, st_sysstat s10t0,   st_sysstat s10t1 bid              = 26079              = s0.dbid bid             = s0.dbibid             = s0.dbince_number     = 1nce_number     = sstance_number  =stance_number  =_id             = s0.snanap_id          = s0.snnap_id          = s0.snat_name        = phat_name        = s10_id in (335,336,33ap_id ASC; Enterprise Manmes from quer_TIME,YY/MM/DDlta, value)* 8192)/102HOUR FROM s1.ENMINUTE FROM s1SECOND FROM s1              ‐‐ physica950532    ‐‐ DBID d d 1               ‐‐ INSTANs0.instance_numb= s0.instance_num= s0.instance_numap_id + 1 nap_id nap_id + 1 hysical reads 0t0.stat_name 7,338,339) nager Performry below:D HH24:MI) TIME,24/1024)  / ((roundND_INTERVAL_TIM.END_INTERVAL_T.END_INTERVAL_Tl reads, diffed NCE_NUMBER er mber mber mance page shd(EXTRACT(DAY FRME ‐ s0.END_INTERTIME ‐ s0.END_INTTIME ‐ s0.END_INThows that theROM s1.END_INTERVAL_TIME) * 60 TERVAL_TIME)  TERVAL_TIME) / 6e Disk IO is arERVAL_TIME ‐ s0.E60, 2))*60) round our derEND_INTERVAL_TIrived valueIME) * 1440  
  4. 4. You may have noticed that I used the SQL trick below that has similar effect to the LAG function. This enables thequery to get the start_value and end_value on a single row making it possible to get the delta value and apply theperformance formula. The view DBA_HIST_SNAPSHOT also acts as an ultimate reference of snap information thatallows joining to the other DBA_HIST views to provide meaningful data on other subsystems or workloadperformance data.AND s10t0.snap_id          = s0.snap_id AND s10t1.snap_id          = s0.snap_id + 1 The query I’ve shown you is just one part of the story, that’s only giving the “IO Read MB/s” - an IO subsystemstatistic. Ideally we must have a correlation on the following subsystems of the database server to fully characterizethe overall workload and performance:1) Oracle Oracle instance and database configuration2) Operating System CPU, memory, IO, and network3) Application SQLs and anything specific to the applicationFor the correlation we would be using the “3-circle analysis” technique [1] where each subsystem represents a circleand is diagnosed separately and then in combination. If the problem resides with the database server, the overlap ofthe 3 circles is the current performance problem. By doing this we will have a clear correlation of the workload andperformance across subsystems and will have targeted efforts to improve the overall response time.In mining the AWR having a query in a time series layout and only the relevant statistics shown side by side can bevery useful in various ways and even if it can’t be shown side by side each bottleneck period relates to a particularSNAP_ID so the correlation across various performance data is extremely possible!Having this we would have the following advantages Quickly notice trends for performance diagnosis We have the beautiful set of workload and performance data now in our control We have lots of data points for statistical and predictive analysis Faster analysis ever!
  5. 5. AaTTcScript Naawr_genwawr_topevawr_servicAs I go alongapplied succesThe chart beloThe table belcreated:ame DBwl DBDBDBDBvents DBDBDBces DBDBg with my ressfully on reaow shows thelow shows thIMBA_HIST vieBA_HIST_SNAPSBA_HIST_OSSTABA_HIST_SYS_TBA_HIST_SYSSTBA_HIST_SNAPSBA_HIST_SYSTEBA_HIST_SYS_TBA_HIST_SNAPSBA_HIST_SERVIesearch of mial world perfocategorical rhe importantMPORTANT NOewsSHOTATTIME_MODELTATSHOTEM_EVENTTIME_MODELSHOTICE_STATining the AWormance scenarelationship odetails of thTE: DiagnosticData presAASCPU capacCPU requirMemory reIO requireLogged onCPU UtilizaEventEvent RanWaitsTimeAvgwt (msDB Time %AASWait ClassService NaDB TimeDB CPUPhysical ReLogical ReaAASWR I have crarios.f the scripts:he scripts andc Pack Licensesentedcityrementsequirementsmentsusersationks)%ameeadsadsreated and cod some reasoe is needed forDescriptioThis is theoverview ofthe relationsUtilization =The AAS coperiods whejust idleThis is a vewith AAS mComing frommust be awdrilling dowof data overGraphing ththat outputsdifferent wayou could gService enaor allowingThis data isus a classifdatabase.Showing thicolumn willmost the woollected someon behind hor the scriptsonstarting point.f the load of thship of the form= Requirementsolumn serves aere the databarsion of "Top 5etric.m the awr_genwware about the cn on the time cr a period of timhis data will be ms a nice graph aait classes givingo back and drillables the groupithe distributions commonly seefication of theis data in a timgive us an ideaorkload of the de useful scripow they are fYou first runhe database semula/ Capacityas a (golden) mase could be hTimed Events"wl, for the AAScomponents of Acomponents) anme (across SNAPmuch like the Eand slicing the Ag you a broad “l down on the png of commonof connectionsen on the Enterapplication/mome series mannea if particular apdatabase.pts that I havformatted andthis SQL to harver. It clearlymetric on findihaving a bottlenbut across SNAto be more useAAS (much liked have this kindP_IDs).nterprise ManagAAS component“historical” viewpast load activitydatabase connes (e.g. RAC).prise Manager todule activity oer and adding applications areedave anshowsng theneck orAP_IDsful wedgerts towhichy.ectionsto giveon thean AASdriving
  6. 6. awr_sysstat DBA_HIST_SNAPSHOTDBA_HIST_OSSTATDBA_HIST_SYS_TIME_MODELDBA_HIST_SYSSTATAASLIO/sDB Block Changes/sUser Calls/sParses/sHard Parses/sSorts/sLogon/sSQL*NET to client MBSQL*NET to dblink MBThis is a version of "Load Profile" but across SNAP_IDs withAAS metric.Useful to quickly notice the Oracle workload change. You mayput additional SYSSTAT statistic you want to monitor here.awr_topsqlx DBA_HIST_SNAPSHOTDBA_HIST_SQLSTATDBA_HIST_SQLTEXTSQL_IDPlan Hash ValueModuleElapsed Time (s)Elapsed Time / exec (s)CPU Time (s)IO Time (s)App Time (s)Concurrency Time (s)Cluster Wait (s)LIOPIODirect WritesRowsExecParse CountPX ExecTime RankAASSQL_TEXTThe “SQL section” of the AWR report is usually segregated intosections ordered by the following: Elapsed Time CPU Time Gets Reads Executions Parse CallsHaving separate data for a particular problematic SQL_IDspread over 1000+ lines of report makes it hard to find everydetail about its performance.I feel there’s a better way to present the data. And here arethe info/sections youll get from the script and some shortdescription:1) snap_id, time, instance, snap durationThe time period and snap_id could be used to show the SQLsfor a givenworkload period..lets say you usual work hours is 9-6pm, youcould justshow the particular SQLs on that period.. theres a data rangesection onthe bottom of the script you could make use of it if you want tofilter.2) sql_id, plan_hash_value, moduleYou could make use of this info if you want to know where theSQL wasexecuted (SQL*Plus, OWB, Toad, etc.).. plus you couldcompare theplan_hash_value but I suggest you make use of KerryOsbornesawr_unstable_plans.sql script if youd like to search forunstable plans.3) total elapsed time, elapsed time per exec- cpu time- io time- app wait time- concurrency wait time- cluster wait timeThese are the time info.. at least without tracing the SQL youdknow whattime component is consuming the elapsed time of thatparticular SQL.. solets say your total elapsed time is 1000sec, and cpu time of30sec, and iotime of 300sec... you would know that it is consumingsignificant IO but youhave to look for the other 670sec which could be attributed by"other" waitevents (like PX Deq Credit: send blkd,etc,etc)4) - LIOs- PIOs- direct writes- rows- executions- parse count- PXSome other statistics about the SQL.. if your incurring a lot ofPIOs, howmany times this SQL was executed on that period, the # of PXspawed.. justbe careful about these numbers if you have "executions" of
  7. 7. lets say 8.. youhave to divide these values to 8 as well as on the timesection..only the "elapsed time per exec" is the per execution value..this is for formatting reasons because I cant fit them all on myscreen..5) - AAS (Average Active Sessions)- Time Rank- SQL type, SQL textThis is one of my favorites... this will measure hows the SQL isperforming against the database server.. Im using the AAS &CPU count as myyardstick for a possible performance problem (I suggestreading Kyles stuffabout this):if AAS < 1-- Database is not blockedAAS ~= 0-- Database basically idle-- Problems are in the APP not DBAAS < # of CPUs-- CPU available-- Database is probably not blocked-- Are any single sessions 100% active?AAS > # of CPUs-- Could have performance problemsAAS >> # of CPUS-- There is a bottleneckso having the AAS as another metric on the TOP SQL is goodstuff.. Ive alsoadded the "time rank" column to know what is the SQLsranking on the topSQL.. normally the default settings of the script will show timerank 1 to 5.. this could be useful also if you are finding aparticular SQL that is onrank #15 and you are seeing that theres an adhoc query thatis time rank #1and #2 affecting the database performance..And.... this script could also show SQLs that span acrossSNAP_IDs... Iwould order the output by SNAP_ID and filter on that particularSQL then youwould see that if the SQL is still running and span across letssay 2SNAP_IDs then the exec count would be 0 (zero) and elapsedtime per exec is0 (zero).. only the time when the query is finished youll seethese valuespopulated.. Ive noticed this behavior and its the same thingthat is shownon the AWR reports.. you could go here for that scenariohttp://karlarao.tiddlyspot.com/#%5B%5BTopSQL%20on%20AWR%5D%5Dawr_topsql DBA_HIST_SNAPSHOTDBA_HIST_SQLSTATDBA_HIST_SQLTEXTSQL_IDPlan Hash ValueModuleElapsed Time (s)Elapsed Time / exec (s)CPU Time (s)Cluster Wait (s)LIOPIORowsExecParse CountPX ExecTime RankAASSimilar columns from awr_topsqlx but this time just showingthe top 20 SQLs across SNAP_IDs.awr_unstable_plans(by Kerry Osborne)DBA_HIST_SNAPSHOTDBA_HIST_SQLSTATSQL_IDExecutionsMin,Max,Avg EtimeAvg LIOSTD_DEVThis script finds SQL statements with plan instability. I like theclever use of standard deviation to show SQLs with variableelapsed time.
  8. 8. awr_parm_mods(by Kerry Osborne)DBA_HIST_SNAPSHOTDBA_HIST_PARAMETERV$INSTANCEParameter NameOld ValueNew ValueThis script shows all parameters (including hidden) that havebeen modified.awr_netwl DBA_HIST_SYSMETRIC_SUMMARY Network Minvalue (MB)/sNetwork Maxvalue (MB)/sNetwork Avgvalue (MB)/sNetwork STD_DEV (MB)/sThe data comes from the metric family of tables that shows“Network Traffic Volume Per Sec”Keep in mind that metrics are different from sysstat values. Onsysstat you just get the delta and the rate, in metric thesampling is different lets say the snap duration is 10mins whatmetric does is it samples on per 60sec interval (num_interval)and get the max, min, avg, std_dev of those samples.awr_est_gc_traffic(byJohn Kanagaraj)DBA_HIST_SNAPSHOTDBA_HIST_SYSSTATDBA_HIST_DLM_MISCV$DATABASEV$PARAMETEREstimated InterconnectTraffic (KB)This script is ideal for RAC environment and shows theinterconnect throughput of an instance. Very useful if you wantto check if the interconnect is being saturated.awr_iowl DBA_HIST_SNAPSHOTDBA_HIST_OSSTATDBA_HIST_SYS_TIME_MODELDBA_HIST_SYSSTATAASCPU IO WAIT UtilizationOS LoadSingle Block R/W IOPSMulti Block R/W IOPSR/W MB/sTotal R/W IOPSR/W RatioHW Disk IOPSHW # of DisksThis script is ideal for monitoring the Oracle IO activity. Veryuseful for sizing and consolidating storage for Oracledatabases. This can be used together with a storagemonitoring tool to have a complete picture of IO performance.The last two columns have the corresponding formula that isused by storage engineers to determine the number of diskneeded by the database.HW Disk IOPS = (IOPS * Read Ratio) + (IOPS * Write Ratio *RAID penalty)HW # of Disks = Total disk IOPS / IOPS per diskOf course the “HW # of Disks” is not the final number. Thereare other factors (bandwidth, throughput, service time, etc.)that need to be considered to determine the right storage for aparticular IO workload but this can be your starting point. Alsobenchmarking will help a lot on the storage decisions.awr_io_ts DBA_HIST_SNAPSHOTDBA_HIST_FILESTATXSDBA_HIST_TEMPSTATXSTablespace R/W IOPSTablespace R/W latencyThis script shows the IO performance of the tablespaces. Thisis the same as what you see in AWR but across SNAP_IDs.The latency formula is as followslatency (ms) = (readtim / phy reads) * 10Keep in mind that on this script the IOPS and latency valuesare aggregated from all the datafiles of the tablespace. Sodiagnosing latency issues using this script may not representthe actual numbers but may warn you from the textual trendsof high latency (ms) numbers that way you’ll be interested onparticular workload periods to probe it with small durationsamples.awr_io_file DBA_HIST_SNAPSHOTDBA_HIST_FILESTATXSDBA_HIST_TEMPSTATXSDatafile R/W IOPSDatafile R/W latencyThis script shows the IO performance of the datafiles. This isthe same as what you see in AWR but across SNAP_IDs.Keep in mind that the IOPS and latency values may benormalized if the snap interval is too long (60mins above)compared to per 5seconds or 10 minute snap interval. (seeAppendix)r2toolkit [2] DBA_HIST_SNAPSHOTDBA_HIST_DATABASE_INSTANCEDBA_HIST_SYSSTATDBA_HIST_SYSTEM_EVENTDBA_HIST_SYS_TIME_MODELDBA_HIST_OSSTATDBA_HIST_WR_CONTROLY and X values that canbe plotted for LinearRegressionThis is a performance toolkit that uses AWR data and LinearRegression to identify what metric/statistic is driving thedatabase server’s workload. The data points can be very usefulfor capacity planning giving you informed decisions andcompletely avoiding guesswork!You can also do the same kind of mining with Statspack. Each DBA_HIST view has a counterpart Statspack view andyou can achieve similar resultsDBA_HIST_SNAPSHOT = STATS$SNAPSHOTDBA_HIST_OSSTAT = STATS$OSSTAT
  9. 9. DBA_HIST_SYS_TIME_MODEL = STATS$SYS_TIME_MODELDBA_HIST_SYSSTAT = STATS$SYSSTATThe scripts mentioned are freely downloadable and more details on the math and performance formulas (rates, time,IOPS, CPU, latency, utilization, AAS) will be discovered when you look into the SQL code. I would also suggest thatif you are serious on mining the AWR you must take time to play further with the DBA_HIST tables and theunderlying data and you’ll appreciate that you have a better understanding on how the data are derived on the plainAWR report.PART 3 - VisualizationAverage Active Sessions (AAS) has become my default (golden) metric on finding the periods where the databasecould be having a bottleneck or just idle. Essentially AAS is the database load; this value should not go above theCPU count (NUM_CPUS in DBA_HIST_OSSTAT) and if it does then that means the database is working very hardor waiting a lot for something.Together, the AAS & CPU count is used as a yardstick for a possible performance problem [3]    If AAS < 1        ‐‐ Database is not blocked      AAS ~= 0        ‐‐ Database basically idle       ‐‐ Problems are in the APP not DB      AAS < # of CPUs       ‐‐ CPU available       ‐‐ Database is probably not blocked       ‐‐ Are any single sessions 100% active?      AAS > # of CPUs       ‐‐ Could have performance problems      AAS >> # of CPUS       ‐‐ There is a bottleneck Just like a doctor, AAS could be your “stethoscope” when investigating performance problems but it doesn’t stopthere. For it to be more useful you must be aware about the components of AAS much like drilling down on thetime components and have this kind of data over a period of time (across SNAP_IDs). Well Enterprise Managerdoes this nice graphs on the “Performance and Top Activity page” and slicing the AAS components into different“Wait Classes” and it’s got a “Historical” view which you could go back and drill down on the past load activity.But what could be the problem?
  10. 10. SI know solong AWRbecause thsome otheSo what could1) USN2) OTo be coSNAP_IDThe imagthere’s acomponenome of you hR retention phere was an ier issue whered be the alterUse the Top TNAP_IDsOr use the scrionsistent withD 335-339. Noge below is abig spike onnts.have encounteperiod (365 dainstance shutde Enterprise Mrnative?imed Eventspt together wh the initial eote that the Astacked arean the databaseered this Entays to exaggedown betweeManager reallSQL (awr_towith Perfsheetexample weAAS during thawr_chart of thee load… buterprise Manaerate it) but En the date yoly can’t just gopevents.sql)! … a great towill focus onhis period had_genwl.sql ouawr_topeventwe want toager error atEnterprise Maou want to gogive you the vand focus onool for ad-hocn the same id a sudden spiutputnts.sql using Pknow moresome point. Yanager won’to and the datevisualization ythe AAS andc performanceinterval timeike that is on tPerfsheet. It’sabout it by dYou are conflet you go bae you are nowyou need.d wait class coe visualizatio6:20 to 7:0the range of 2s clear from tdrilling downfigured withack farther alw. Or could bolumns acroson [4]1 AM that i2.2 to 3.5the image than on the AASallessatS
  11. 11. SLooking aknow whactivity, itSome more baOn the Einto differFrom theways to d1) T2) SAAS on tmodel. TDBA_HISat the “textuahich AAS comt’s evident thaackgroundnterprise Marent wait clas2nd slide of Kerive the valuTime Statisticsamplingthe PerformaThis is alsoST_SYSTEMal trends” of amponent is dat there’s a hianager “Perforses. But, did yKyle Hailey’sue:sance Page usewhat the scM_EVENT aStackeawr_topeventdriving the wigh User IO aawr_tormance” andyou know thas presentationes “Time Stacript awr_topand the “CPUed area chart ots.sql output jworkload of thactivity.opevents.sql o“Top Activitat their data son [3] on AASatistics” and ipevents.sql isU” from timof AASjust by lookinhe database.outputty” Page youources are dif(Average Acis actually frs doing… itme model vieng at the AAFor the part’ll see the AAfferent?ctive Sessionsrom v$systemt unions theew DBA_HISS column weticular SNAPAS compones) it says thatm_event + CPe output ofST_SYS_TIMe would easilyP_IDs of highents are slicedthere are twoPU from tim“events” onME_MODELyhdenL
  12. 12. SNand then fit look simAAS valu“CPU useAAS on thon a 15refresh toCPU fromSo what’s theOn a highto Performsession (ththink) thaTime StatIf you waHistory ofNow time forFinding thwe can crfilter only themilar to the Eues will be coued by this sesshe Top Activ5sec refresho Historical tm time model)e effect?h CPU activitymance Pagehe only way tan v$sysstat “tistics (one ofant more infof Session LoaPerfsheet a lhe AAS compeate the samee top 5 and doEnterprise Maunted. By thesion”.vity Page usesrate… butthen it also st).y period you. Simply becto see CPU u“CPU used byf two ways too about the dad [5] and AAla Enterpriseponent that’se visualizationStackeo this across thanager Perfore way, on 10gs “Sampling”as I havetarts to behav’ll notice thatcause ASH susage real timy this sessioncalculate AAetails aroundAS investigatiManager!driving the wn like the Ented area charthe SNAP_IDrmance Pageg below the loand by defauobserved whve like the Pet there will besamples everye) while the tn” there couldAS) which coud the Performon [14]workload is aerprise Manat AAS compoDs but for grapI have to incoad chart is coult is taking ahen you swerformance Pe a higher AAy second andtime model Cd still be somuld be affectemance and Toa lot easier inager broken doonents – waiphing purposeclude all of toming from vadvantage of Awitch fromage (pulls daAS on the Tod it does thaCPU althoughme lag time aned by averagep Activity pan graphics. Thown into “Wait classes on the Perfthe “events”v$system_eveASH (samplethe Real Tata from v$syp Activity Paat quickly onh it updates qund it will stils.age this is wohe image beloait Class”.fsheet to makso that all thent + v$sysstaes) and does iTime 15 seystem_event +age comparedn every activuicker (5secsll be based onorth readingow shows thaeeatitc+deIn-at
  13. 13. Even morgraphs. Bis mostlyOoops, douses couldview andcomparechart viewCompareit’s on theThen comhappening1.6 on SNre, we have telow is brokeconsuming thon’t get too exd hide importsee the datathe above anw could tell athe wait clase range of 0.1mpare the waig.. but on 3DNAP_ID 335 athe data nowen down intohe AAS.Stackxcited.. important informatia clearly sepand below cha more meaninss chart… ab1 (hidden betit event charyou can seeand 336. Yes,in our contro“Wait Eventsked area chartrtant remindeion and sometarated into thharts, you’ll kngful story.bove notice thtween CPU anrt… notice ththat only the, you will alsool. So we cous”, aside fromt AAS componer… the 2-dimtimes could bheir respectiveknow what Ihe blue (Othend System IOhe big differendb file sequo not be fooleuld play aroum being moreents – wait evmensional Stabe misleadinge componentI mean.. Waer wait class)O)… that’s a bnce on the chential read aed when youund with thecolorful it letventsacked area chag [13] and it rts, rather thanait Class andon the rangebig differencehart? above yand direct palook at the radata and creat’s you see whart that Enterpreally helps ton being stackd Wait Evente of AAS of 1e!you can’t realath read are oaw data… butate interestinghat wait evenprise Manageo have anotheked… As youts in 3D area1 while belowlly tell what’on the AAS ot visualizationgntereruawsofn
  14. 14. Ais much eAAS throughOn my tedata. YouSNAP_IDdatabase.beyond mthere youasier and theout the AWRst machine Iu can see fromD 335-339) haYou could amy maximumcould use ASway to go bu33DR retention pehave 365 daym the chart bappens to bealso see the pCPU whichSH, run the Aut you must beD area chart AD area chart AAeriod!ys retention pbelow (stackethe highest lperiod of shutcould justifyAWR report, rue able to sensAAS componenAS componentperiod. This eed area chartoad period ftdowns (negathe drill dowun ADDM, oe and validatents – wait clasts – wait evenenables me tot), that whatfrom all the Aative value) awn on the speor make use oe if it’s drivinssntso have a datawe are focusAAS samplesand other timecific SNAP_f your high cang you to badwarehouse ofsing on (6:20for the lifetimme period whe_IDs or timealiber scripts!d conclusions.f performanc0 to 7:01 AMme of my tesere AAS wenframe… from!eMstntm
  15. 15. PARUThe goodRT 4 - CapaUtilization isCapacityexpectedwill fit inmeasuremand presenMeasuring H E EOn the InexplainedRequiremEssentiallformulaUtilization = As shownwater” andecision tinto the sserver capmuch or ithing here isacity Plannthe ultimate mplanning plaand unexpecnto the availabment [7]. Goont the in a mog the workloaHave enough cEnable us to quEnable us to quntroduction tod in detail whaments, and Uly what we caRequirements / Cn on the imagnd “another pto purchase therver. And opacity. And wt could be the, you are notningmetric!ays a very imted workloadble capacity od thing the dore meaningfuad will give ucapacity and nuantify the reuantify the beo Oracle Servat informationUtilizationare most in CCapacity ge below thepitcher with bhe database sf course, thewhen this doese other way arguessing!mportant roleds. The primaof the databaata collectionul and usefulus the followinnot over buyesults of respoenefit of worker Consolidatn you need toCapacity Plann“empty pitchbeer” are theerver that is tapplication rsn’t occur nicround where tto ensure prary principlese server. Ann process is amanner.ng advantagesonse time optikload reductiotion paper [6o get for youning is the daher” representOracle workthey define threquirement mcely, there canthe capacity iroper resourcis to ensurend with this walready beings and benefitsimizations inon] and Chapteto be able toatabase serverts the databaskload requirehe capacity. Tmay or mayn be an excesis not enoughes are availathe applicatiwe need to hadone by AWs [7]:the savings oer 9 of Craigdefine the Dar utilization ase server capaements. TypicThen they stanot fit nicelys of capacity,h for the requiable and be aion workloadave a facilityWR. We just nof system resoShallahamer’atabase Serveand it is repreacity while thcally the IT sart pouring thy on the avail, which meanirements at haable to handld requirementy for workloadneed to extracources’s book [8] her’s Capacityesented by thihe “glass withshop makeshe applicationlable databasns IT spent tooand.esdctey,shaseo
  16. 16. This simppresentedHaving thperiods wple and very uin a mannerhe data presenwith high workuseful concepthat we can ented this waykload requirempt can be appasily abstracty, we can easiments.plied as well it the performaily apply filtein AWR. Usiance statisticser to the dataing the awr_gs to the Utilizset and immgenwl.sql scrzation formulamediately findript the data ia.d the workloadsd
  17. 17. CAnd we cAAS range     Per SNAP_ID        Oracle CPU U    OS CPU Utiliz    Particular Wo  AND TO_CHA AND TO_CHA AND TO_CHA AND TO_CHA AND s0.END_ AND s0.END_CPU sizing reHaving thThe dataserver is aoccurred.needed toThe formucore need = #The datacollocatedcan virtuaould do otheraas > 1 or range of SNAPid in (336) where id >= 3Utilization oracpupct > 5zation oscpupct > 50orkload periods AR(s0.END_INTERVAR(s0.END_INTERVAR(s0.END_INTERVAR(s0.END_INTERV_INTERVAL_TIME _INTERVAL_TIME ecommendatihis data outpupoints belowa dual core mThe managehandle the wula used to de# of cores * utilizatpoints wered to a data cenalize it to a ner filtering as wP_IDs 36 and  id <= 340 0 0 VAL_TIME,D) >= VAL_TIME,D) <= 7VAL_TIME,HH24MVAL_TIME,HH24M>= TO_DATE(2010<= TO_DATE(2010ionsut can be easilw came from amachine and bement wouldworkload of therive the “CPUtion * 1.25 very usefulnter, we couldewer hardwarewell…1     ‐‐ Day of week7 MI) >= 0900     ‐‐ HMI) <= 1800 0‐jan‐17 00:00:000‐aug‐22 23:59:59y used as inpan actual probeen used forlike to knowhe database.U core need”to characterd opt to just ue.k: 1=Sunday 7=SatHour ,yyyy‐mon‐dd hh9,yyyy‐mon‐dd hhuts to CPU sioduction server almost 8 yeaw what would[9] is as folloize the curreupgrade to a nturday h24:mi:ss)    ‐‐ Dath24:mi:ss‘) izing of a dataer that needsars and thered be the idealows:ent utilizationnewer modelta range abase server.to be migrate have been al machine andn of the databut not the lated to a newcouple of had how manyabase server.atest and the gmachine. Thardware errorcores will bSince it wagreatest or wesese
  18. 18. SBut noticesummarizignore theValidatinga year. Hprocess waffect theStorage sizingHaving the the outlier (zing the datae outlier just lg with the appHaving this inwill run againoverall conneg recommendhis data outpu(shown in redwill tell me tlike that becauplication ownnformation, won the new seected users.dationsut can be easild above) reprethat I’m mostuse there migner, she confiwe can safelyerver we justy used as inpesenting a SNt of the timeght be a criticairmed that it wremove thehave to makeuts to storageNAP period haon the 10 %<al applicationwas indeed anoutlier frome sure that it’e sizing of a daving high CP< CPU utilizan process on thn adhoc procethe data pois being run odatabase servePU utilizationation but wehat workloadess that is beiints and evenon an off-peaker.n. Statisticallydon’t want tod period.ing done oncn if the adhok period to noyoecot
  19. 19. The datamentionedcan be usmeasuredAlso takedeterminewill help aFor storagpoints belowd above. Thissed together wdata easily trnote that thee the right stoa lot on the stge sizing purpw came froms shows the Iwith a storagransforms reqre are other forage for a patorage decisioposes, I strongm awr_genwIOPS requireme monitoringquirements tofactors (bandwarticular IO wons.gly recommenwl.sql as welments neededtool to havecapacity.width, throughworkload butnd using the al, sizing stod to run the de a completehput, servicethis can be yawr_iowl.sqlrage for thedatabase on tpicture of IOtime, etc.) thyour startingsame produthe new envirO performanchat need to bepoint. Also buction systemronment. Thice. Having thconsidered tobenchmarkingmseog
  20. 20. ReaDal World ExDiagnosing aThe graphprocessingdone anyperformanSo it’s aplotted inwas ablevisualizatOn this impeaks aresuspect orparticularand OS sproblem.If it werenThis is thexampleand Resolvingh shown wasg so it’s the mchanges on tnce problem ssudden slowone graph…to apply theion and I wasmage above ye the particular possible culdatabase sesstatistics (CPUn’t for this vise image afterg GC Block La sudden slomost critical wthe database eso the tasks odown, and Ithat would ae things thats able to achieyou can see tar periods wlprit for the pssions runningU, memory,sualization threplacing theLostow down on aweek of the menvironment…f finding wheI was thinkinanswer a lot ot I have leareve what I havthe where, whe are interestperformance pg critical modnetwork) wehe troubleshooe network intea client runnimonth. Interv… well that were/when/whyng… if I canof questions.ned. So I mve envisionedhen, and whyted in. And wproblem. Drildules that aree were able coting would herconnect swiing 2 nodes oviewing the Dwould be they it went wronhave time seComing frommade use of Pd.y. Most of thwhat wait evlling down fue slow plus cconclude thathave taken lonitch… this shof RAC andDBA, he woumajority of thng is all left toeries performm Tanel PodePerfsheet andhe load is onvents are conurther on thoscorrelating itt it was a nenger.ows their norit’s a periodld insist thathe customerso us.mance of bothr’s seminar ind played arothe first nodentributing onse peak periowith the dataetwork intercormal workloadof month endthey have nos will say onh of the noden Singapore,ound with the. And on ththe peak isods and on thabase advisoronnect switchd.dotasIeeaesh
  21. 21. LLinear RegresMining thtargeted reThe graph8core HS2respectiveat >80% tOn the drihigh loadcomponenwhen lookreduction,If the servseems to bssion of AAShe AWR backesponse timeh shown below21 Bladeserveely which shothe AAS alsoill down showSQL greatlynt being utilizking at the SQ, response timver’s workloabe low. AlsoNodS and CPU onked by solid soptimizationw is a scatterer on a DS48ows a strong cshoots up!wn below on taffecting thezed is on “CPUQL details onme optimizatioad is on theyou will noticde 1n 2 node RACstatistical anans and workloplot of a prod00 SAN. Notcorrelation bethe peak periooverall perfoU” hence youawr_topsqlx.on, and hugeAAS value oce the top SQCalysis [10] [1ad reduction.duction envirotice the strongtween AAS vod with AASormance of thu will see larg. Tuning the hsavings on syof 2.2, the CPQL from AAS1] [12] lets yonment withg correlation cvs. CPU utilizvalue of 10 ie database. Age LIOs and mhigh load SQLystem resourcPU utilizationof 10 is not tyou do foreca2 nodes of 11coefficient (Rzation. Also wit shows that tAlso note thatmost of the elL will result tces.n, latency, Athere anymoreNodast that can gu1gR1 RAC ruR2) of .97 andwhen CPU stathe workloadthe large chulapsed time spto great workAAS componee.de 2uide you withunning ond .89arts to queueis driven byunk of AASpent on CPUkloadent on “CPUh”
  22. 22. DrillThe perfodatabase sinformedThe toolk- CREAT- DROP- CREAT- POPUL- ANAL- POPUL- R2 REPing down o1) General W2) Tablespa3) Top Tiormance toolkserver’s workdecisions andkit contains 7TE USER - cTABLES - dTE THE r2 TLATE y dataYZE r2 VALLATE x and rPORT - geneon the peakWorkload reporace IO reportmed Eventskit uses AWRkload based ond completelysections, see breates the r2todrop the tablesTABLES - cre- y data is theLUES - get theresidual datarate the textuworkload...rtR data and Linn AAS. The davoiding guebrief descriptoolkit users for a fresh seate the maine "dependente stat names w- x data is theual report and. with AAS onear Regressiodata points casswork!tion below:tarttablesvalue", variabwith high r2 ve "independenr2 values witof 10on to identifyan be very useble whose vavalues, to havnt value", useth or w/o outly what metric/eful for capaclue is to be prve a more accud to predict thliers/statistic is dricity planning gredictedurate analysishe value of yiving thegiving yous
  23. 23. Now4) Top 206) Top 5 SQw on the low0 SQLsQLs of SNAP_IDw workloadD 8631.. which bperiod… wiy the way got aith AAS of 2n AAS of 102.2
  24. 24. Refe1) Genera2) Tables3) Top Ti4) Top 20No entry – t6) Top 5 SQerences [1] Craig [2] r2proj [3] Kyle H [4] Tanel [5] Histor [6] Craig [7] Andy [8] Craig [9] Husnuhttp://husn [10] Forec [11] Statisal Workload repspace IO reportmed Events0 SQLsthe top SQL froQLs on SNAP_IDShallahamerect - http://kaHailey SeminPoder – Perfsry of session lShallahamerRivenes – OrShallahameru Sensoy - Danusensoy.filecasting Oraclstics Withoutportm AAS of 10 isD 8582- Oracle Perfarlarao.tiddlysnar – AAS presheet http://wload - http://si- Introductionracle Workloa- Oracle Perfatabase Consos.wordpress.ce Performanct Tearsnot here anymoformance Firespot.com/#r2pesentationwww.tanelpodites.google.con To Oracle Sad Measuremformance Fireolidation Bestcom/2010/05/ceoreefighting - Chprojectder.com/files/Pom/site/youviServer Consolmentefighting - Cht Practices/database-conhapter 1PerfSheet.zipisualize/activelidationhapter 9nsolidation-bepe-session-histest-practices.ptorypdf
  25. 25. Ape [12] Neer [13] Neilhttp://arxi [14] AAS Other refeo hto Sto hto htendix - AveThe IO latlatency (ms) =The imagshorter shttp://wwraj Bahatia – Ll Gunther &iv.org/pdf/080S investigationerences:ttp://karlarao.torage IOPS,ttp://karlarao.ttp://karlarao.erage Latentency formula= (readtim / phy reges below shonap intervalw.freelists.orLinear RegresTanel Poder09.2532n http://goo.gl.wordpress.cocapacity, per.tiddlyspot.co.tiddlyspot.concy Issuea used in AWeads) * 10 ow that latencls. Also reag/post/oracle-ssion Paperr - Multidiml/5WaAgomrformance, coom/#Statisticsom/#OraclePeWR is as followcy values maad on this l-l/Disk-Devicmensional Visost - http://gooserformancews:ay be normalilink for thece-Busy-Whasualization ofo.gl/FCN0wized if the sne effects ofat-exactly-is-tf Oracle Pernap interval iCPU schedthis,7rformance usis too long asduling issuessing Barry007s compared tos on latency7oy