Nisa Retail: Improving Service
and Cutting Costs
David Morris & Paul Smith, Nisa Retail
Richard Bishop, Intechnica
Nisa Business Overview
 UK’s leading member-owned organisation
• Mutual organisation, member owned, operates like a co-op...
IT Context
IT Context
Importance of APM for Nisa
 Cut off points
 Member / customer retention
 User satisfaction
 Increasing load and app co...
Intechnica: digital performance
A digital consultancy specialising in
online application development & performance
 Consu...
PurePath
dynaTrace Implementation at Nisa
dynaTrace
Server
dynaTrace
Client
Tactical Approaches
Strategic Performance Improvements
OCS Login Difficulties
Drill down to errors
Drill back up to PurePaths
View Order – Bottleneck
View Order – Bottleneck
View Order – Bottleneck
Diagnosing Third-party Faults
• Hover over “block”
• Name of ASP control identified
 Click PurePath Hotspot
 Highlight Method and Class
experiencing performance problem
Diagnosing third-party faults
No forecast
Application ExceptionNo forecast
Site offline
Other improvements for /home.aspx
From fault to fix
0
1000
2000
3000
4000
5000
6000
0
10
20
30
40
50
60
70
80
12/23/2011
1/12/2012
2/1/2012
2/21/2012
3/12/2...
Performance optimisation
Performance optimisation
10th July 2012
11th July 2012
Long term trends
Long term improvements
 Identifying slowest and most frequent db calls
Long term improvements
 Identifying slowest pages
Understanding performance tests
 Monitor test as well as production environments
 Tagged web requests help to identify b...
Key points
 Use APM to get an understanding of application behaviour and performance
• In production and test environment...
Business Value
 Fewer complaints
 Developer time is better spent
 Better user experience
 Overall: member retention
Next Steps
Web: www.intechnica.co.uk
Email: more@intechnica.co.uk
Tel: 0845 680 9679
Fax: 0845 2991647
Address: Fourways House, 4th F...
Upcoming SlideShare
Loading in …5
×

How Nisa Retail improve service & cut costs through APM

3,533 views

Published on

Find out how Nisa Retail cuts support costs and boosts long-term client retention with IT performance experts Intechnica, through using Compuware dynaTrace APM.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,533
On SlideShare
0
From Embeds
0
Number of Embeds
1,750
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • OCS and Helpdesk are the applications monitored using dynaTrace
    Core system based on Oracle (Retek) RMS
    OCS = Order Capture System
    Our key area of interest is OCS which is used by Telesales and external customers, but also the helpdesk application
    Order entry is via EPOS devices and WebUI
    OCS also gives access to Order History, Product Information, Special Offers etc.
    Helpdesk and OCS share a common platform
  • Internal and external clients
    Low volume, high complexity
    Multi-tier architecture
  • Cut off points – significant peaks in demand
    Member / customer retention is critical to the business
    User satisfaction is key to customer retention
    Load on the application is increasing over time as business grows. As well as this, the application complexity is increasing, users are spending longer on the site.
    Load on the website is continually increasing (+20% this year)
    Higher customer expectations, other retail sites are fast which drives expectations for this site. Continual improvements required (46% faster this year, but this trend needs to continue)
  • Intechnica are Compuware partners of Compuware.
    Specialising in application development and performance
    Consultancy – IT strategy and BPM (BPM = Business Process Management) Involved in application development – Microsoft and AWS partner Performance improvement and assurance – developers and testers working together to improve client’s application performance We provide cloud consultancy in three areas – cloud environment provisioning, cloud app dev and cloud testing.

    Intechnica has been working with Nisa for 6 years and is currently involved in
    Strategic performance improvements
    Tactical problem resolution

    …….the way we’re able to do both those things is via the dynaTrace PurePath…
  • PurePath is part of the dynaTrace core technology A PurePath helps to identify performance down to a transaction level and stores relevant information including timing information through each application layer.
    Helps to identify what proportion of time is spent in network, application tier, database, web servers etc. This includes through public, private, hybrid, hetereogeneous (dotnet/Java) cloud. Can also track synthetic users through an application to measure performance even when your users aren’t online.


    Mention good introduction to a purepath is also available on YouTube “dynaTrace PurePath Technology 2-Minute Explainer” - http://www.youtube.com/watch?v=V_Ydfows_xo
    As well as identifying poor performing transactions, can drill down into code and look at sequence diagrams, highlighting queries that could be cached to improve performance etc. Give visibility of system issues and assists in real time monitoring
  • Here’s where we get our PurePaths from!

    Our dynaTrace server has its own SQL installation, sessions store and also hosts the collector.
    As I mentioned earlier, high complexity but low volume.
    Larger environments with more transaction monitoring requires more servers to host dT. Sizing guide available in documentation, or come and ask us / Compuware.
    Currently only monitor a single server, plan to increase scope in the near future…

    On a tactical level, this enables us to…
  • Tactical - Analysis of an outage Functional Health, Error drill down, Identifying problem pages Will take you through this in a “real life” example later

    Sites getting more complex, involving 3rd party content, can help identify external problems, I have an example of this…

    Can also use e.g. Layer breakdown to diagnose problem with particular application tier. E.g. in this case blue = ADO time (ActiveX Data Objects) – Microsoft’s database client Increases in this layer whilst others remained constant, could be indicative of poor DB performance.
    dT can help with identifying problems, through from “Fault to Fix”
  • Strategic – long term view Identifying “db heavy” transactions Identifying all slow pages, ranking them, helps with prioritising performance optimisation work.
    Quantifying “slowness”, e.g. Is it just today? Are we busier than normal? Or is something wrong?
    Performance test analysis, monitor performance tests to get deep understanding of applications before they go live. Quantifying improvements through the test lifecycle and into production. Moving to longer term analysis

    Next slides take us through some “real world” scenarios at Nisa.
  • Tactical example Received a phone call to say that members were complaining of difficulties logging into OCS. 14:15 - Approaching afternoon cut-off, so a critical time for the business.
    Inability to place orders causes delays to the daily order and has knock on effects throughout the business, warehousing, stock control, just-in-time ordering etc.
    Very efficient warehouse operation, saves costs through efficiencies (JIT etc.) but tight schedules for deliveries / despatches, over 1000 vehicle movements per shift.
  • This screenshot taken “after the event”, because during the triage of problem didn’t hand around to take screenshots! Approx 5 minutes into the problem, clicked on “red bar”. Drilled down to errors page. Could see that we had experienced 16 errors in this 5-minute period.
    Not a crisis at this stage but a definite cause for concern.
  • Drilling down into the errors to the “pure paths” view showed that all the errors were related to the home page.
    We drilled further and found that the problem was related to a problem with Umbraco (which handles the customer-specific content). We restarted Umbraco (single application pool) and the problem was resolved.

    15 minutes from problem identification to resolution.

  • More strategic example. This view shows the PurePath page for a particular moment in time. We can see that the slowest transaction here is “View Order” taking approx 3 seconds.
    This is a critical interaction on an ecommerce site: if the page takes too long to load, users will lose confidence in your ability to correctly process their order.
    Member confidence is critical (member retention), helps to retain members and ensure that they keep coming back. This type of performance is vital both for Nisa as well as traditional B2C websites.
    We can use dT to see where that time is spent.
  • By clicking on the “blob” we can highlight where the time is spent.
    The portion of the purepath represented by the largest “blob” is where the bulk of the time is spent.
  • In the lower part of the screen we can see that of 3.082 seconds, 1.567 seconds was ASP execution time.
    In this case though we know that this execution time can include db wait time.
    Often see a large elapsed time on rows which show a db query.
    This gives us an insight into the application behaviour both under normal as well as high-load. Helps to give testers, operations people and developers a better understanding of the application.

    ***And also, of course, fixing the problem leads to better customer satisfaction, customer retention and ultimately more revenue!***
  • Tactical example Purepaths can also be used for fault diagnosis. These 2 purepaths show performance for a particular aspx before (green) and after (amber) an incident
    View has been filtered to only contain these two queries. There are options to compare purepaths from different times to help diagnose problems.
    The two purepaths represent the same business process taking very different amounts of time to complete.
    LHS = 9.8s RHS = 107s !!
  • In this case diagnosis was easy, I hovered my mouse cursor over the “block” in the purepath and it showed the name of the ASP control where the time was being spent.
    Ion this case it was a method called “RenderWeather”.
  • By clicking on the block, the method and class affected by the poor performance was highlighted so I could contact the developers / support team quickly with an initial diagnosis.
    Again this was business affecting because the slow page load time was preventing users from logging into the system and placing orders.
  • The third party site which provided the forecast was down. Because the calls the external content were synchronous, users were forced to wait over 100 seconds for their login to succeed! We fixed this problem in the short term by removing the weather call. This then led us to consider some other longer term improvements for the home page. We made several calls asynchronous and so transparent to users.
    Users can now wait for content (if they want it) or click on to the next page if they’re in a rush (e.g. order entry)

    ***Gave control of the experience to the user***
  • Red arrow indicates that this image “scrolls for “miles”
    Our investigation of the weather problem caused us to look at login specifically. We had already made performance improvements to the site but they had gone unnoticed by the majority of users. Perception was that “login was slow” therefore the “site was slow”. I looked at the sequence diagrams for the home page and could see “chattiness”, frequent repeated calls to back-end. This is a screenshot that I sent to our developer in an email, my comment helped to draw the importance of home page improvements to the developer team.
    To counter this, the developers implemented more caching on a per-user and per-session basis.
  • As a result of the performance improvements to home.aspx, we achieved several things. On a technical level, 70-80% performance improvement on the login page.
    Best bit was “user perception”. Users commented on faster order entry etc. even though those improvements had been in for several weeks by the time this change was made.
    Against an increasing load (more than doubled hits/day since January) we are continuing to deliver faster response times. Still averaging 46% faster than at the start of the year!
  • Another example of performance optimisation is shown by another “screenshot”. I identified a correlation between poor performing pages and a particular stored procedure in the database. All the pages outlined in red were performing poorly. Drilling down into them showed that the purepaths related to these pages were all timing out as they executed a certain stored procedure. This was linked to pages where certain adverts were published . Dev team rewrote the stored procedure, we re-tested and deployed into production and reduced response times for this page by 20%.

    As well as reducing response times, more importantly we reduced load on the Oracle database.
    Database utilisation (measured in our test environment) showed a fall from 22% to 13% CPU utilisation.
  • Graph added yesterday !! Graphs show purepath response times for OCS application after helpdesk performance improvemetns made on 10th July.
    Same time 11:00 to 12:00 (peak hour). Same scale on graphs.
    Immediate visual indication of changes to response times.
  • As well as identifying current problems, can use to compare performance over time. e.g. This example shows a a recurring performance issue where database problems (bad Oracle plan selection) affected application performance.
    DynaTrace helped to identify the characteristics of poor db performance and allowed us to differentiate between “bad days” and “busy days”.
    This problem has now been resolved. The lower chart taken from June-July shows more stable performance, fewer peaks. Also interesting to note the peaks on Monday-Friday, quieter weekends.
  • One way of improving performance has been to identify queries that are executed frequently or take a long time. dT helps us to identify these queries without bothering the DBAs. This reduces time taken to identify problems and fix them. Using dT to monitor tests, allows us to fix problems in test environments before production DBAs identify problems with the customer-facing system.
  • As well as identifying slow database queries, can sort by “slowest pages”. Helps to identify next areas to focus on when planning performance improvements.
    e.g. in this case Order History – not crucial to placing orders, but frequently accessed by members.
    Blurred out potentially sensitive data
  • Can customise dT in two ways, as well as labelling purepaths with the name of a business transaction e.g. renaming order_entry/order.aspx to “Place Order”. Can label transactions from performance test tools. In this case Facilita Forecast, but also supported by SilkPerformer and LoadRunner OOTB.
    And we’re now using Compuware Load360 to monitor synthetic user requests Can simply add a customer header including the virtual user name, transaction name, script name etc. to help identify page requests and tie them down to business processes / test steps.
  • Quantifying performance improvements is important because user perceptions…….. Some ad-hoc improvements, limited feedback from members. When improved home, people noticed other improvements that were already there. They may not notice some improvements whilst they’re hung up on another issue or perceived problem.
    Story behind each bullet point 
  • Less complaints requiring human intervention
    *** Less complaints = improved service! ***
    *** Less human intervention = time and money saved***

    Less time spent bug hunting *** = IT staff time is being better spent ***

    *** Better user experience***
    Reintroduction of targeted ads, better for members, better for Nisa (improving sales etc.)
    *** Example of the the 3rd party weather plugin ***

    ***Overall *** Better customer experience -> better client / member retention
    ***More sales
    Nisa Retail has fewer customers than a B2C ecommerce site, but that makes each customer that much more important!
    Each store is a significant revenue stream – by delivering great performance and user experience we keep them coming back. They know they can rely on us.***

  • Current situation – partial view – single server monitored.
    Useful but limited when asked “can you see poor performance”, especially if only a single user or server is affected

    Will add dT agents to the other servers in the web farm to give a more complete view.

    Continue to work on performance improvements, targetting db and front-end heavy/frequent transactions. Working with DBAs to identify poor performing queries and optimise performance against continually increasing demands in terms of increasing application complexity and increased user load. Increased scope Longer term analysis and trending Business and IT Dashboard – integration with other monitoring tools such as Gomez APM and HP Sitescope
  • How Nisa Retail improve service & cut costs through APM

    1. 1. Nisa Retail: Improving Service and Cutting Costs David Morris & Paul Smith, Nisa Retail Richard Bishop, Intechnica
    2. 2. Nisa Business Overview  UK’s leading member-owned organisation • Mutual organisation, member owned, operates like a co-operative • Collective buying power to reduce costs for members  > 1000 member shareholders  > 3750 stores nationwide  > £1.3bn turn-over
    3. 3. IT Context
    4. 4. IT Context
    5. 5. Importance of APM for Nisa  Cut off points  Member / customer retention  User satisfaction  Increasing load and app complexity  Increasing load  High customer expectations  Continual improvements required
    6. 6. Intechnica: digital performance A digital consultancy specialising in online application development & performance  Consultancy – IT strategy and BPM  Application development  Performance improvement and assurance  Cloud consultancy
    7. 7. PurePath
    8. 8. dynaTrace Implementation at Nisa dynaTrace Server dynaTrace Client
    9. 9. Tactical Approaches
    10. 10. Strategic Performance Improvements
    11. 11. OCS Login Difficulties
    12. 12. Drill down to errors
    13. 13. Drill back up to PurePaths
    14. 14. View Order – Bottleneck
    15. 15. View Order – Bottleneck
    16. 16. View Order – Bottleneck
    17. 17. Diagnosing Third-party Faults
    18. 18. • Hover over “block” • Name of ASP control identified
    19. 19.  Click PurePath Hotspot  Highlight Method and Class experiencing performance problem
    20. 20. Diagnosing third-party faults No forecast Application ExceptionNo forecast Site offline
    21. 21. Other improvements for /home.aspx
    22. 22. From fault to fix 0 1000 2000 3000 4000 5000 6000 0 10 20 30 40 50 60 70 80 12/23/2011 1/12/2012 2/1/2012 2/21/2012 3/12/2012 4/1/2012 4/21/2012 5/11/2012 5/31/2012 6/20/2012 7/10/2012 Hits/day Responsetime(s) Response times and hits /home.aspx : H1 2012 Avg. Page Load Time (sec) Pageviews
    23. 23. Performance optimisation
    24. 24. Performance optimisation 10th July 2012 11th July 2012
    25. 25. Long term trends
    26. 26. Long term improvements  Identifying slowest and most frequent db calls
    27. 27. Long term improvements  Identifying slowest pages
    28. 28. Understanding performance tests  Monitor test as well as production environments  Tagged web requests help to identify business transactions
    29. 29. Key points  Use APM to get an understanding of application behaviour and performance • In production and test environments • Assists in fault diagnosis, reducing diagnosis and fix times • Directs performance optimisation efforts • Helps differentiate between “bad” and “busy” days • Quantifies performance improvements
    30. 30. Business Value  Fewer complaints  Developer time is better spent  Better user experience  Overall: member retention
    31. 31. Next Steps
    32. 32. Web: www.intechnica.co.uk Email: more@intechnica.co.uk Tel: 0845 680 9679 Fax: 0845 2991647 Address: Fourways House, 4th Floor, 57 Hilton Street, Manchester, M1 2EJ Questions

    ×