STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

1,003 views

Published on

Presentation given at STPCon 2014. It highlights the top performance problems seen in 2013 and how we can identify these problems in dev & test instead of waiting until the app crashes in production

Published in: Software, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,003
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
19
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • … if it doesn’t scale
  • When we look at the results of your Testing Framework from Build over Build we can easily spot functional regressions. In our example we see that testPurchase fails in Build 18. We notify the developer, problem gets fixed and with Build 19 we are back to functional correctness. Looking behind the scenesThe problem is that Functional Testing only verifies the functionality to the caller of the tested function. Using dynaTrace we are able to analyze the internals of the tested code. We analyze metrics such as Number of Executed SQL Statements, Number of Exceptions thrown, Time spent on CPU, Memory Consumption, Number of Remoting Calls, Transfered Bytes, …In Build 18 we can see a nice correlation of Exceptions to the failed functional test. We can assume that one of these exceptions caused the problem. For a developer it would be very helpful to get exception information which helps to quickly identify the root cause of the problem and solve it faster.In Build 19 the Testing Framework indicates ALL GREEN. When we look behind the scenes we see that we have a big jump in SQL Statements as well as CPU Usage. What just happened? The Developer fixed the functional problem but introduced an architectural regression. This needs to be looked into – otherwise this change will have negative impact on the application once tested under loadIn Build 20 all these problems are fixed. We are still meeting our functional goals and are back to acceptable number of SQL Statements, Exceptions, CPU Usage, …
  • Web Architectural Metrics# of JS Files, # of CSS, # of redirectsSize of Images
  • STP 2014 - Lets Learn from the Top Performance Mistakes in 2013

    1. 1. LET’S LEARN FROM THE TOP PERF MISTAKES @grabnerandi http://apmblog.compuware.com
    2. 2. What to do with the fastest car …
    3. 3. … if it fails to reach the finish line
    4. 4. What to do with millions of $$ for building a web site …
    5. 5. Performance, Scalability & Architecture
    6. 6. #1: Architectural Decisions
    7. 7. #1: “We want more Web 2.0”
    8. 8. #1: Load Test Prior to Change
    9. 9. #1: Load Test After Change
    10. 10. Metrics: # Visitors # Requests / User Business: Do we need all these bells and whistles?
    11. 11. #2: Disconnected Teams
    12. 12. #2: “Teamwork” between Dev and Ops SEV1 Problem in Production Need access to log files Where are they? Can’t get them Need to increase log level Can’t do! Can’t change config files in prod!
    13. 13. #2: Solution: Implement a Custom “On Demand” Remote Logger
    14. 14. #2: Implementation and Rollout Implemented Custom Logger Worked well in Load Testing
    15. 15. #2: What happened? ~ 1Mio Lock Exceptions in 30 mins
    16. 16. #2: Root Cause: A special WebSphere Setting! Log Service provides a synchronized log file across ALL JVMs Log Service provides a synchronized log file across ALL JVMs
    17. 17. Metrics: # Log Messages, # Exceptions Share: Same Server Settings
    18. 18. #3: Implementation Flaws
    19. 19. #3: Business Impact requires Action!
    20. 20. #3: Solution: Cache to the RESCUE!!
    21. 21. #3: Implementation and Rollout Implemented InMemory Cache Worked well in Load Testing
    22. 22. #3: Result: Out of Memory Crashes!! Still crashes Problem fixed!Fixed Version Deployed
    23. 23. Metrics: Heap Size, # Objects Allocated, # Objects in Cache Cache Hit Ratio Test: With realistic Data
    24. 24. #4: Push without a Plan
    25. 25. #4: Mobile Landing Page of Super Bowl Ad 434 Resources in total on that page: 230 JPEGs, 75 PNGs, 50 GIFs, … Total size of ~ 20MB
    26. 26. #4: m.store.com redirects to www.store.com ALL CSS and JS files are redirected to the www domain This is a lot of time “wasted” especially on high latency mobile connections
    27. 27. #4: Critical Pages not Optimized! Browse, Search and Product Info performs well … because they don’t follow best practices: 87 Requests, 28 Redirects, … Critical Pages such as Shopping Cart are very slow …
    28. 28. Metrics: Load Time, # Resources (Images, …), # HTTP 3xx, 4xx, 5xx Dev: Build for Mobile Test: Test on Mobile
    29. 29. #5: “Blindly” (Re)use Existing Components
    30. 30. #5: Requirement: We need a report
    31. 31. #5: Using Hibernate results in 4k+ SQL Statements to display 3 items! Hibernate Executes 4k+ Statements Individual Execution VERY FAST But Total SUM takes 6s
    32. 32. #5: Requirement: We need a fancy UI
    33. 33. #5: Using Telerik Controls Results in 9s for Data- Binding of UI Controls #1: Slow Stored Procedure Depending on Request execution time of this SP varies between 1 and 7.5s #2: 240! Similar SQL Statements Most of these 240! Statements are not prepared and just differ in things like Column Names
    34. 34. Metrics: # Total SQLs # SQLs / Web Request # Same SQLs / Request Transferred Rows Test: With realistic Data Dev: “Learn” Frameworks
    35. 35. #6: No “Agile” Deployment
    36. 36. Adonair #6: Load Spike resulted in Unavailability
    37. 37. #6: Alternative: “GoDaddy goes DevOps” 1h before SuperBowl KickOff 1h after Game ended
    38. 38. #6: Behind the Scenes
    39. 39. Metrics: Availability Page Size, # Objects # Hosts, # Connections DevOps: “Feature” Switches
    40. 40. What have we learned today?
    41. 41. # of Requests / User # of Log Messages # of Exceptions # Objects Allocated # Objects In Cache Cache Hit Ratio # of Images # of SQLs # SQLs per RequestAvailability # HTTP 3xx, 4xx Page Size
    42. 42. A final thought …
    43. 43. How about this idea? 12 0 120ms 3 1 68ms Build 20 testPurchase OK testSearch OK Build 17 testPurchase OK testSearch OK Build 18 testPurchase FAILED testSearch OK Build 19 testPurchase OK testSearch OK Build # Test Case Status # SQL # Excep CPU 12 0 120ms 3 1 68ms 12 5 60ms 3 1 68ms 75 0 230ms 3 1 68ms Test Framework Results Architectural Data We identified a regresesion Problem solved Let’s look behind the scenes Exceptions probably reason for failed tests Problem fixed but now we have an architectural regression Problem fixed but now we have an architectural regression Now we have the functional and architectural confidence
    44. 44. How? Performance Focus in Test Automation Cross Impact of KPIs Analyzing All Unit / Performance Tests Analyze Perf Metrics Identify Regressions
    45. 45. More Info • My Blog: http://apmblog.compuware.com • Tweet about it: @grabnerandi • dynaTrace Enterprise – Full End-to-End Visibility in your Java, .NET, PHP Apps – Sign up for a 15 Days Free Trial on http://compuwareapm.com • dynaTrace AJAX Edition – Browser Diagnostics for IE + FF – Download @ http://ajax.dynatrace.com
    46. 46. THANK YOU @grabnerandi

    ×