Transcript of "Enough Blame for System Performance Issues"
Mahesh Vallampati ◦ Career Director of Oracle Applications and Technologies at Coach America, the Bus Company. Sales and Consulting at Hotsos and SmartDog Director of IT Database Services at Ceva Logistics Sales and Consulting at Oracle for 9 years ◦ Education Courses in Business at Houston Community College Master’s in EE from Texas A&M University Career Focus ◦ Used to be DBA ◦ Now Techno-Functional (Fechnical)
White Papers ◦ http://www.slideshare.net/mvallamp Email: email@example.com Blog: http://mvallamp.blogspot.com Linked in Group Leader: DBA Manager Oracle Alumni Admin for content: 5000 members
Performance Issues Blame Blame The Business Blame the Functional Team Blame the Developer Blame the DBA Conclusions
Performance Issues ◦ Described in vague terms The System is slow The disk is slow The network is slow The SQL sucks The server sucks The problem is in the Application server The database is slow The code is awful ◦ My favorite Slow as molasses
Issue Approach ◦ Sound sophisticated but not really effective (My opinion) The BCHR is low or high We need to run Stats-pack AWR is better No, ASH is better “My” method is better Have you run statistics? Have you rebuilt the indexes? Let us take a look at the top 10 SQL statements The average physical I/O is high
Let us Blame somebody Let us not leave anybody out Let us Blame ◦ The Business People ◦ The Functional Team ◦ The Developers ◦ The DBA’s
Manufacturing Company In the Midwest Migrated from Mainframe to Oracle Applications (11.5.10) Did Big Bang implementation of Manufacturing, Order Management and Financials System was barely usable Flew there the day they went live after receiving panic calls
Implementation vendor was there Another “expert” vendor was there ◦ BCHR and Block Size changes were being discussed Every theory possible was floating around Listened for an hour to the issues Asked a basic question ◦ Are you losing money and if so why?
$$$$ ◦ We are losing money ◦ We can’t take orders ◦ Order Management is not able to take orders It is taking 10 minutes to enter a 6 line order It used to take less than a minute in the mainframe Good place to start They can pay my bills if only they make money
Traced the Order Management process ◦ Entering a header ◦ Entering a line Line Trace ◦ Entering a line was taking 1 minute and a half Most of the time was spent on calculating the price of an item Enter “Advanced” Pricing
Can we use Simple Pricing instead of Advanced Pricing? Enter the “Business” Discussion with COO ◦ Why are you using Advanced Pricing? We can make more money. ◦ How much more money can you make with Advanced Pricing instead of Simple Pricing? Don’t know
Potential Upside on a 1billion USD in revenue ◦ 4 million a year if we are lucky Investment required to make the system perform at acceptable levels ◦ Cost – 10 to 15 million dollars Standby Reporting Archive/Purge Hardware Investments Software Investments Investments in people, processes etc.
Probability 50% Discount Rate 5% Year1 Year2 Year3 Year4 Year5 Revenue Upside $4,000,000 $4,000,000 $4,000,000 $4,000,000 $4,000,000NPV $17,317,907 NPV for 5 YearsProbableRevenue $8,658,953 NPV times the ProbabilityInvestmentRequired $15,000,000 Capital Investment Required. Depreciation not included.Profit ($6,341,047) Revenue-Cost Incurred
Question to the COO ◦ Would you spend 15 million USD in today’s dollars for a potential upside of 4 million dollars ◦ Economic forecast is gloomy ◦ Interest and Borrowing costs are high Right Answer, No.
Changed from Advanced Pricing to Simple Pricing The system was fast in performing transactions System had over-capacity Even in month end, the system did not even flinch Performance team disbanded Rebuilding Indexes and tables into bigger block table-spaces got a yawn Solved in 2 days
There is not enough hardware in the world to support an irrational business requirement Ask why complex functionality is required when simple functionality will do Ask the “Business” to justify need for “Advanced Functionality” Solution ◦ Switch to Simple Pricing ◦ System Stabilized ◦ Project started on Monday Morning ◦ Project Ended Wednesday Afternoon ◦ Back Home Wednesday night
Manufacturing Company in Ohio Migrated to Oracle Applications Severe Performance Issues Order Management is slow Call center for customers is just horrific Each call taking 10-15 minutes when before each call during the mainframe was taking 2- 3 minutes Customer was using Configurator and BOM
Method-R Traced the Order Management Process which uses the Configurator Identified the SQL Statement that was taking a long time
SQL statement was performing close to 900,000 Logical I/O’s per execution SQL Statement was getting executed for every item in the BOM that makes up an order line The more complex the BOM, the more items in the BOM, the slower the process The SQL Statement was determining whether the item needed to be displayed in the invoice for the customer and in the packing slip
1. SELECT x 2. FROM BOM_BILL_OF_MATERIALS BOM, BOM_INVENTORY_COMPONENTS COMP, 3. MTL_SYSTEM_ITEMS_B MTL, MTL_SYSTEM_ITEMS_B MTL_BOM, MTL_PARAMETERS MP 4. WHERE BOM.COMMON_BILL_SEQUENCE_ID = COMP.BILL_SEQUENCE_ID 5. AND COMP.COMPONENT_ITEM_ID = MTL.INVENTORY_ITEM_ID 6. AND MTL.ORGANIZATION_ID = MP.ORGANIZATION_ID 7. AND (MTL_BOM.SEGMENT1 LIKE %_ITEMS 8. OR MTL_BOM.SEGMENT1 LIKE %_ITEM 9. OR MTL_BOM.SEGMENT1 LIKE %_ITEMS2 10. OR MTL.BOM_ITEM_TYPE = :B3 11. OR MTL.SEGMENT1 LIKE OC_%) 12. AND BOM.ASSEMBLY_ITEM_ID = MTL_BOM.INVENTORY_ITEM_ID 13. AND BOM.ORGANIZATION_ID = MP.ORGANIZATION_ID 14. AND BOM.ORGANIZATION_ID = MTL_BOM.ORGANIZATION_ID 15. AND MP.ORGANIZATION_CODE = :B2 16. AND NVL(COMP.EFFECTIVITY_DATE, SYSDATE) <= SYSDATE 17. AND NVL(COMP.DISABLE_DATE, SYSDATE) >= SYSDATE 18. AND COMP.COMPONENT_ITEM_ID = :B1
Should we tune the SQL? A couple of Predicates were using leading wildcards ◦ AND (MTL_BOM.SEGMENT1 LIKE %_ITEMS ◦ OR MTL_BOM.SEGMENT1 LIKE %_ITEM ◦ OR MTL_BOM.SEGMENT1 LIKE %_ITEMS2 ◦ OR MTL.BOM_ITEM_TYPE = :B3 ◦ OR MTL.SEGMENT1 LIKE OC_%)
First instinct ◦ Blame the developer Met with the developer ◦ He said that was the functional spec Met with the functional person ◦ Asked them why they were looking for %ITEM %ITEMS %ITEMS2 OC_%
Functional Person explanation ◦ Oh, it is a convention we use ◦ If the item type ends in ITEM, ITEMS or ITEMS2 or starts with OC, then display it ◦ Otherwise, don’t display it ◦ They were proud of it too
Should we create a reverse string functional index and switch the predicates? Umm..No
Called a meeting ◦ Business Users ◦ Functional Team ◦ Developers ◦ DBA’s Took a dictionary to the meeting ◦ Asked them to find words that ended with M or S quickly ◦ Everybody was looking as if I was not sane I explained that was what they were asking the system
Created a flexfield with a Y or N at the inventory item level Updated the field to indicate the status Rewrote the SQL Statement to use the flag instead of the convention SQL now used 40 Logical I/O’s System speed was so fast that the productivity of the call center was up.
Try to answer the SQL statement yourself as if you did not have a system Would it make sense? If no, then the system is probably going to be bad at it too and will be slow Systems can process and automate irrational requests The cost and pain is going to be high System can do dumb but slow (Exadata anybody?)
Manufacturing Company in Illinois Customer had implemented Oracle Applications Shipping Application is slow SR with Oracle Support going nowhere Cannot determine if it is custom or standard
Method-R Traced the Shipping Process Each Line in the shipping screen was taking 20-25 seconds Identified the SQL Statement that was taking a long time Each execution was taking 650,000 Logical I/O’s.
Oracle Financials has flexfields which have multiple purposes They have to be varchar2 It is a common practice to store key values in these columns Key values tend to be numbers Joins are made to flexfields from these keyfield If you use an implicit type conversion happening in the plan, this is usually the issue
Query now executes 700 Logical I/O’s down from 650,000 LIO’s Query now has sub-second response Productivity is now way up. Life is good
Known he was joining a varchar to a number field Recognize that it could be an issue Checked the number of Logical I/O’s it was doing Examined the explain plan Can we have a “lint” for SQL?
Customer was a food distributor in Illinois They were going live on Oracle Financials They had several customizations which they were rolling out Customizations were released as a series of scripts The scripts were rolled out at 2:30 PM Testing would start
Performance was horrible Find Customers, Find Orders and Find Shipment forms would take forever The next day the system performance was fine Everything was fast Everybody were stunned ◦ Client DBA team ◦ Oracle Consulting Team
Traced the processes using 10046 A lot of joins to custom temporary tables were observed for queries with poor performance A little bit more digging and almost all performance issues were related to the temporary tables So what is that with these temporary tables that performed well the day after they were deployed
After a few hours of searching, did a quick explain plan in SQL Developer Noticed the “Rule” in the explain plan Wait a minute – we should be using CBO. This is the 9i database Hmm..Let us look at the temporary tables Aaah..ah
Stats on temporary tables were not being collected as a part of the code deployment A nightly process that gathered stats on all objects that did not have statistics gathered the rest That explained why performance was poor the day the code was deployed and better the day after
This was interesting Whose job was it to have the stats gathering script as a part of the deployment Was it the DBA? Or the developer? It is a moot point when IT is on the line Is it the linebacker or the safety’s fault in a busted play? Who cares if you lose?
Use Method-R to get to the root cause Use other methods to confirm the problem ◦ Statspack ◦ Grid Control ◦ Homegrown scripts Avoid your first instinct
First Instinct ◦ Try to tune the SQL ◦ Try to solve the problem ◦ Wrong Approach Correct Approach ◦ Identify blame ◦ Assign Blame (Important) ◦ Ask the “makes sense” question ◦ Will it make us money? ◦ If not, don’t do it
Eliminate the problem Train the user not to create the problem Reschedule the problem Tune/Optimize the problem Relocate the problem