OCS and Helpdesk are the applications monitored using dynaTrace Core system based on Oracle (Retek) RMS OCS = Order Capture System Our key area of interest is OCS which is used by Telesales and external customers, but also the helpdesk application Order entry is via EPOS devices and WebUI OCS also gives access to Order History, Product Information, Special Offers etc. Helpdesk and OCS share a common platform
Internal and external clients Low volume, high complexity Multi-tier architecture
Cut off points – significant peaks in demand Member / customer retention is critical to the business User satisfaction is key to customer retention Load on the application is increasing over time as business grows. As well as this, the application complexity is increasing, users are spending longer on the site. Load on the website is continually increasing (+20% this year) Higher customer expectations, other retail sites are fast which drives expectations for this site.Continual improvements required (46% faster this year, but this trend needs to continue)
Intechnica are Compuware partners of Compuware. Specialising in application development and performance Consultancy – IT strategy and BPM (BPM = Business Process Management)Involved in application development – Microsoft and AWS partnerPerformance improvement and assurance – developers and testers working together to improve client’s application performanceWe provide cloud consultancy in three areas – cloud environment provisioning, cloud app dev and cloud testing.
Intechnica has been working with Nisa for 6 years and is currently involved in Strategic performance improvements Tactical problem resolution
…….the way we’re able to do both those things is via the dynaTrace PurePath…
PurePath is part of the dynaTrace core technology A PurePath helps to identify performance down to a transaction level and stores relevant information including timing information through each application layer. Helps to identify what proportion of time is spent in network, application tier, database, web servers etc. This includes through public, private, hybrid, hetereogeneous (dotnet/Java) cloud.Can also track synthetic users through an application to measure performance even when your users aren’t online.
Mention good introduction to a purepath is also available on YouTube“dynaTrace PurePath Technology 2-Minute Explainer” - http://www.youtube.com/watch?v=V_Ydfows_xo As well as identifying poor performing transactions, can drill down into code and look at sequence diagrams, highlighting queries that could be cached to improve performance etc.Give visibility of system issues and assists in real time monitoring
Here’s where we get our PurePaths from!
Our dynaTrace server has its own SQL installation, sessions store and also hosts the collector. As I mentioned earlier, high complexity but low volume. Larger environments with more transaction monitoring requires more servers to host dT.Sizing guide available in documentation, or come and ask us / Compuware. Currently only monitor a single server, plan to increase scope in the near future…
On a tactical level, this enables us to…
Tactical - Analysis of an outageFunctional Health, Error drill down, Identifying problem pagesWill take you through this in a “real life” example later
Sites getting more complex, involving 3rd party content, can help identify external problems, I have an example of this…
Can also use e.g.Layer breakdown to diagnose problem with particular application tier.E.g. in this case blue = ADO time (ActiveX Data Objects) – Microsoft’s database clientIncreases in this layer whilst others remained constant, could be indicative of poor DB performance. dT can help with identifying problems, through from “Fault to Fix”
Strategic – long term viewIdentifying “db heavy” transactionsIdentifying all slow pages, ranking them, helps with prioritising performance optimisation work. Quantifying “slowness”, e.g. Is it just today? Are we busier than normal? Or is something wrong? Performance test analysis, monitor performance tests to get deep understanding of applications before they go live. Quantifying improvements through the test lifecycle and into production.Moving to longer term analysis
Next slides take us through some “real world” scenarios at Nisa.
Tactical exampleReceived a phone call to say that members were complaining of difficulties logging into OCS.14:15 - Approaching afternoon cut-off, so a critical time for the business. Inability to place orders causes delays to the daily order and has knock on effects throughout the business, warehousing, stock control, just-in-time ordering etc. Very efficient warehouse operation, saves costs through efficiencies (JIT etc.) but tight schedules for deliveries / despatches, over 1000 vehicle movements per shift.
This screenshot taken “after the event”, because during the triage of problem didn’t hand around to take screenshots!Approx 5 minutes into the problem, clicked on “red bar”.Drilled down to errors page. Could see that we had experienced 16 errors in this 5-minute period. Not a crisis at this stage but a definite cause for concern.
Drilling down into the errors to the “pure paths” view showed that all the errors were related to the home page. We drilled further and found that the problem was related to a problem with Umbraco (which handles the customer-specific content).We restarted Umbraco (single application pool) and the problem was resolved.
15 minutes from problem identification to resolution.
More strategic example.This view shows the PurePath page for a particular moment in time. We can see that the slowest transaction here is “View Order” taking approx 3 seconds. This is a critical interaction on an ecommerce site: if the page takes too long to load, users will lose confidence in your ability to correctly process their order. Member confidence is critical (member retention), helps to retain members and ensure that they keep coming back.This type of performance is vital both for Nisa as well as traditional B2C websites. We can use dT to see where that time is spent.
By clicking on the “blob” we can highlight where the time is spent. The portion of the purepath represented by the largest “blob” is where the bulk of the time is spent.
In the lower part of the screen we can see that of 3.082 seconds, 1.567 seconds was ASP execution time. In this case though we know that this execution time can include db wait time. Often see a large elapsed time on rows which show a db query. This gives us an insight into the application behaviour both under normal as well as high-load.Helps to give testers, operations people and developers a better understanding of the application.
***And also, of course, fixing the problem leads to better customer satisfaction, customer retention and ultimately more revenue!***
Tactical examplePurepaths can also be used for fault diagnosis.These 2 purepaths show performance for a particular aspx before (green) and after (amber) an incident View has been filtered to only contain these two queries. There are options to compare purepaths from different times to help diagnose problems. The two purepaths represent the same business process taking very different amounts of time to complete. LHS = 9.8sRHS = 107s !!
In this case diagnosis was easy, I hovered my mouse cursor over the “block” in the purepath and it showed the name of the ASP control where the time was being spent. Ion this case it was a method called “RenderWeather”.
By clicking on the block, the method and class affected by the poor performance was highlighted so I could contact the developers / support team quickly with an initial diagnosis. Again this was business affecting because the slow page load time was preventing users from logging into the system and placing orders.
The third party site which provided the forecast was down.Because the calls the external content were synchronous, users were forced to wait over 100 seconds for their login to succeed!We fixed this problem in the short term by removing the weather call. This then led us to consider some other longer term improvements for the home page. We made several calls asynchronous and so transparent to users. Users can now wait for content (if they want it) or click on to the next page if they’re in a rush (e.g. order entry)
***Gave control of the experience to the user***
Red arrow indicates that this image “scrolls for “miles” Our investigation of the weather problem caused us to look at login specifically.We had already made performance improvements to the site but they had gone unnoticed by the majority of users.Perception was that “login was slow” therefore the “site was slow”.I looked at the sequence diagrams for the home page and could see “chattiness”, frequent repeated calls to back-end.This is a screenshot that I sent to our developer in an email, my comment helped to draw the importance of home page improvements to the developer team. To counter this, the developers implemented more caching on a per-user and per-session basis.
As a result of the performance improvements to home.aspx, we achieved several things.On a technical level, 70-80% performance improvement on the login page. Best bit was “user perception”. Users commented on faster order entry etc. even though those improvements had been in for several weeks by the time this change was made. Against an increasing load (more than doubled hits/day since January) we are continuing to deliver faster response times. Still averaging 46% faster than at the start of the year!
Another example of performance optimisation is shown by another “screenshot”.I identified a correlation between poor performing pages and a particular stored procedure in the database.All the pages outlined in red were performing poorly. Drilling down into them showed that the purepaths related to these pages were all timing out as they executed a certain stored procedure. This was linked to pages where certain adverts were published .Dev team rewrote the stored procedure, we re-tested and deployed into production and reduced response times for this page by 20%.
As well as reducing response times, more importantly we reduced load on the Oracle database. Database utilisation (measured in our test environment) showed a fall from 22% to 13% CPU utilisation.
Graph added yesterday !!Graphs show purepath response times for OCS application after helpdesk performance improvemetns made on 10th July. Same time 11:00 to 12:00 (peak hour).Same scale on graphs. Immediate visual indication of changes to response times.
As well as identifying current problems, can use to compare performance over time.e.g. This example shows a a recurring performance issue where database problems (bad Oracle plan selection) affected application performance. DynaTrace helped to identify the characteristics of poor db performance and allowed us to differentiate between “bad days” and “busy days”. This problem has now been resolved.The lower chart taken from June-July shows more stable performance, fewer peaks. Also interesting to note the peaks on Monday-Friday, quieter weekends.
One way of improving performance has been to identify queries that are executed frequently or take a long time.dT helps us to identify these queries without bothering the DBAs. This reduces time taken to identify problems and fix them.Using dT to monitor tests, allows us to fix problems in test environments before production DBAs identify problems with the customer-facing system.
As well as identifying slow database queries, can sort by “slowest pages”.Helps to identify next areas to focus on when planning performance improvements. e.g. in this case Order History – not crucial to placing orders, but frequently accessed by members. Blurred out potentially sensitive data
Can customise dT in two ways, as well as labelling purepaths with the name of a business transaction e.g. renaming order_entry/order.aspx to “Place Order”.Can label transactions from performance test tools. In this case Facilita Forecast, but also supported by SilkPerformer and LoadRunner OOTB. And we’re now using Compuware Load360 to monitor synthetic user requestsCan simply add a customer header including the virtual user name, transaction name, script name etc. to help identify page requests and tie them down to business processes / test steps.
Quantifying performance improvements is important because user perceptions……..Some ad-hoc improvements, limited feedback from members.When improved home, people noticed other improvements that were already there.They may not notice some improvements whilst they’re hung up on another issue or perceived problem. Story behind each bullet point
Less complaints requiring human intervention *** Less complaints = improved service! *** *** Less human intervention = time and money saved***
Less time spent bug hunting *** = IT staff time is being better spent ***
*** Better user experience*** Reintroduction of targeted ads, better for members, better for Nisa (improving sales etc.) *** Example of the the 3rd party weather plugin ***
***Overall *** Better customer experience -> better client / member retention ***More sales Nisa Retail has fewer customers than a B2C ecommerce site, but that makes each customer that much more important! Each store is a significant revenue stream – by delivering great performance and user experience we keep them coming back. They know they can rely on us.***
Current situation – partial view – single server monitored. Useful but limited when asked “can you see poor performance”, especially if only a single user or server is affected
Will add dT agents to the other servers in the web farm to give a more complete view.
Continue to work on performance improvements, targetting db and front-end heavy/frequent transactions. Working with DBAs to identify poor performing queries and optimise performance against continually increasing demands in terms of increasing application complexity and increased user load.Increased scopeLonger term analysis and trendingBusiness and IT Dashboard – integration with other monitoring tools such as Gomez APM and HP Sitescope
How Nisa Retail improve service & cut costs through APM
Nisa Retail: Improving Service
and Cutting Costs
David Morris & Paul Smith, Nisa Retail
Richard Bishop, Intechnica
Nisa Business Overview
UK’s leading member-owned organisation
• Mutual organisation, member owned, operates like a co-operative
• Collective buying power to reduce costs for members
> 1000 member shareholders
> 3750 stores nationwide
> £1.3bn turn-over
Importance of APM for Nisa
Cut off points
Member / customer retention
Increasing load and app complexity
High customer expectations
Continual improvements required
Intechnica: digital performance
A digital consultancy specialising in
online application development & performance
Consultancy – IT strategy and BPM
Performance improvement and assurance
Long term improvements
Identifying slowest and most frequent db calls
Long term improvements
Identifying slowest pages
Understanding performance tests
Monitor test as well as production environments
Tagged web requests help to identify business transactions
Use APM to get an understanding of application behaviour and performance
• In production and test environments
• Assists in fault diagnosis, reducing diagnosis and fix times
• Directs performance optimisation efforts
• Helps differentiate between “bad” and “busy” days
• Quantifies performance improvements
Developer time is better spent
Better user experience
Overall: member retention