Your SlideShare is downloading. ×
0
www.outsystems.com 1 © 2012 outsystems – all rights reserved
Performance:
Troubleshooting and Monitoring
A framework appro...
www.outsystems.com 2 © 2012 outsystems – all rights reserved
www.outsystems.com 3 © 2012 outsystems – all rights reserved
Performance Troubleshooting
Motivations for a framework
ü De...
www.outsystems.com 4 © 2012 outsystems – all rights reserved
agileplatform
environment
Performance Troubleshooting
Where i...
www.outsystems.com 5 © 2012 outsystems – all rights reserved
Performance Troubleshooting
Designing the framework
1st
Probl...
www.outsystems.com 6 © 2012 outsystems – all rights reserved
Performance Troubleshooting
Designing the framework
Where
All...
www.outsystems.com 7 © 2012 outsystems – all rights reserved
Performance Troubleshooting
Framework – Pattern 1
All Applica...
www.outsystems.com 8 © 2012 outsystems – all rights reserved
High Load System
99.9% availability
4M searches/month
7M dail...
www.outsystems.com 9 © 2012 outsystems – all rights reserved
1. Symptoms
Where: Low performance on web site
When: During P...
www.outsystems.com 10 © 2012 outsystems – all rights reserved
3. Resolution
o  Contention Measure: Clear execution plan ca...
www.outsystems.com 11 © 2012 outsystems – all rights reserved
Performance Troubleshooting
Framework – Pattern 2
All Applic...
www.outsystems.com 12 © 2012 outsystems – all rights reserved
Core System
2M Software Units
300 GB Database
400K daily web...
www.outsystems.com 13 © 2012 outsystems – all rights reserved
1. Symptoms
Where: Low performance on all applications
When:...
www.outsystems.com 14 © 2012 outsystems – all rights reserved
3. Resolution
o  Data Model and Query optimizations
•  Add/r...
www.outsystems.com 15 © 2012 outsystems – all rights reserved
Performance Troubleshooting
Framework – Pattern 3
Specific Ap...
www.outsystems.com 16 © 2012 outsystems – all rights reserved
Performance Troubleshooting
Framework – Pattern 4
All Applic...
www.outsystems.com 17 © 2012 outsystems – all rights reserved
Performance Troubleshooting
Framework – Pattern 5
All Applic...
www.outsystems.com 18 © 2012 outsystems – all rights reserved
Batch processing with long timer runs
Critical Operation
800...
www.outsystems.com 19 © 2012 outsystems – all rights reserved
1. Symptoms
Where: Low performance/timeout on specific operat...
www.outsystems.com 20 © 2012 outsystems – all rights reserved
3. Resolution
o  Contention Measure: Increase timer timeouts...
www.outsystems.com 21 © 2012 outsystems – all rights reserved
Performance Troubleshooting
The Framework
Where
All Applicat...
www.outsystems.com 22 © 2012 outsystems – all rights reserved
How to gather performance data
Performance Data Tools
www.outsystems.com 23 © 2012 outsystems – all rights reserved
Application
Application Server
Infrastructure
3 layers to ga...
www.outsystems.com 24 © 2012 outsystems – all rights reserved
Performance Data Tools
Infrastructure layer (.NET stack)
Use...
www.outsystems.com 25 © 2012 outsystems – all rights reserved
Performance Data Tools
Infrastructure layer (.NET stack)
Kee...
www.outsystems.com 26 © 2012 outsystems – all rights reserved
agileplatform
environment
Performance Data Tools
Infrastruct...
www.outsystems.com 27 © 2012 outsystems – all rights reserved
Performance Data Tools
Application Server layer (.NET stack)...
www.outsystems.com 28 © 2012 outsystems – all rights reserved
Performance Data Tools
Application Server layer (.NET stack)...
www.outsystems.com 29 © 2012 outsystems – all rights reserved
agileplatform
environment
Performance Data Tools
Application...
www.outsystems.com 30 © 2012 outsystems – all rights reserved
Performance Data Tools
Application layer
Use Agile Platform’...
www.outsystems.com 31 © 2012 outsystems – all rights reserved
Performance Data Tools
Application layer
Service Center Repo...
www.outsystems.com 32 © 2012 outsystems – all rights reserved
agileplatform
environment
Performance Data Tools
Application...
www.outsystems.com 33 © 2012 outsystems – all rights reserved
How to prevent performance emergencies
Now what?
www.outsystems.com 34 © 2012 outsystems – all rights reserved
Performance Monitoring
Goals
ü  Maintain good performance l...
www.outsystems.com 35 © 2012 outsystems – all rights reserved
Agile Platform (Application)
Infrastructure
2 Layer Monitori...
www.outsystems.com 36 © 2012 outsystems – all rights reserved
Performance Monitoring
Infrastructure
Setup monitoring on DB...
www.outsystems.com 37 © 2012 outsystems – all rights reserved
Performance Monitoring
Infrastructure
Define thresholds and a...
www.outsystems.com 38 © 2012 outsystems – all rights reserved
Performance Monitoring
Agile Platform
Collect daily Service ...
www.outsystems.com 39 © 2012 outsystems – all rights reserved
Performance Monitoring
Agile Platform
Check Error Log daily ...
www.outsystems.com 40 © 2012 outsystems – all rights reserved
Performance Monitoring
A framework
1st
Collect
2nd
Analyze
3...
www.outsystems.com 41 © 2012 outsystems – all rights reserved
Performance Monitoring
Phase 1 - Collect
Gather metrics in o...
www.outsystems.com 42 © 2012 outsystems – all rights reserved
Performance Monitoring
Phase 2 - Analyze
Focus on “Top 10” m...
www.outsystems.com 43 © 2012 outsystems – all rights reserved
Performance Monitoring
Phase 2 - Analyze
1st
Collect
2nd
Ana...
www.outsystems.com 44 © 2012 outsystems – all rights reserved
Performance Monitoring
Phase 3 - Implement
Pick “Top X” to a...
www.outsystems.com 45 © 2012 outsystems – all rights reserved
Thank you!
Paulo Cunha
paulo.cunha@outsystems.com
Upcoming SlideShare
Loading in...5
×

OutSystems - A Framework Approach for Troubleshooting - NextStep 2012

752

Published on

Great applications can only be great when they respond at the speed business demands. In this session you will learn form real world experience in troubleshooting, monitoring and running a complex enterprise application to cope with extremely demanding performance requirements. The take away is a simple framework you can apply immediately to make sure your applications are great.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
752
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
26
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "OutSystems - A Framework Approach for Troubleshooting - NextStep 2012"

  1. 1. www.outsystems.com 1 © 2012 outsystems – all rights reserved Performance: Troubleshooting and Monitoring A framework approach Paulo Cunha Solutions Delivery NextStep 2012
  2. 2. www.outsystems.com 2 © 2012 outsystems – all rights reserved
  3. 3. www.outsystems.com 3 © 2012 outsystems – all rights reserved Performance Troubleshooting Motivations for a framework ü Deal with emergency scenarios ü Quick and accurate diagnostic ü Systematic approach ü Common metrics and use cases
  4. 4. www.outsystems.com 4 © 2012 outsystems – all rights reserved agileplatform environment Performance Troubleshooting Where is the fire? Client Frontend 1 Database Frontend 2 External Systems Load Balancer
  5. 5. www.outsystems.com 5 © 2012 outsystems – all rights reserved Performance Troubleshooting Designing the framework 1st Problem tipification •  Where does it happen? •  When does it happen? 2nd Identify possible causes •  Application •  Infrastructure 3rd Identify resolution strategies •  Digg deeper •  Apply known solution
  6. 6. www.outsystems.com 6 © 2012 outsystems – all rights reserved Performance Troubleshooting Designing the framework Where All Applications Specific Application Specific Operation When All the time ? ? ? Peak Hours ? ? ? Periodically ? ? ? Off Peak Hours ? ? ? Pattern 1 Pattern 2 Pattern 3 Pattern 4 Pattern 5
  7. 7. www.outsystems.com 7 © 2012 outsystems – all rights reserved Performance Troubleshooting Framework – Pattern 1 All Applications Allthetime Possible Cause: Horizontal bottleneck, usually related to the database Strategy: •  Check Service Center reports Slow SQL •  Check database server performance counters CPU, Memory, Disk Where All Applications Specific Application Specific Operation When All the time Peak Hours Periodically Off Peak Hours Pattern 1
  8. 8. www.outsystems.com 8 © 2012 outsystems – all rights reserved High Load System 99.9% availability 4M searches/month 7M daily web hits 50K daily visitors Travel Search Web Site Performance Troubleshooting Framework – Pattern 1 - Example Where All Applications Specific Application Specific Operation When All the time Peak Hours Periodically Off Peak Hours Pattern 1
  9. 9. www.outsystems.com 9 © 2012 outsystems – all rights reserved 1. Symptoms Where: Low performance on web site When: During Peak Hours (24/7) i.e. All Time 2. Diagnosis o  Slow SQL reports: queries taking too long o  DB server CPU ~ 100% o  SQL Server’s execution plan cache too large •  Overuse of expand inline parameters •  Detected platform inefficiency on handling variable length data types Travel Search Web Site Performance Troubleshooting Framework – Pattern 1 - Example Where All Applications Specific Application Specific Operation When All the time Peak Hours Periodically Off Peak Hours Pattern 1
  10. 10. www.outsystems.com 10 © 2012 outsystems – all rights reserved 3. Resolution o  Contention Measure: Clear execution plan cache DBCC FREEPROCCACHE o  Remove expand inline parameters from queries o  Agile Platform optimization at query parameterization level Travel Search Web Site Performance Troubleshooting Framework – Pattern 1 - Example Where All Applications Specific Application Specific Operation When All the time Peak Hours Periodically Off Peak Hours Pattern 1
  11. 11. www.outsystems.com 11 © 2012 outsystems – all rights reserved Performance Troubleshooting Framework – Pattern 2 All Applications PeakHours Possible Cause: Infrastructure not handling generated load, usually at Front-End or Database level Strategy: •  Check Service Center reports Slow SQL, Slow Screens •  Check FEs and DB servers performance counters CPU, Memory, Disk Where All Applications Specific Application Specific Operation When All the time Peak Hours Periodically Off Peak Hours Pattern 2
  12. 12. www.outsystems.com 12 © 2012 outsystems – all rights reserved Core System 2M Software Units 300 GB Database 400K daily web hits (200K on May 2011) 600 daily users (300 on May 2011) Insurance Business Application Performance Troubleshooting Framework – Pattern 2 - Example Where All Applications Specific Application Specific Operation When All the time Peak Hours Periodically Off Peak Hours Pattern 2
  13. 13. www.outsystems.com 13 © 2012 outsystems – all rights reserved 1. Symptoms Where: Low performance on all applications When: During the day i.e. Peak Hours 2. Diagnosis o  Slow SQL and Slow Screens reports •  Verified correlation between top queries and top screens o  DB server CPU @ 100%, Memory ~ 99% o  DB server inadequate hardware sizing o  Application data model inefficiencies •  Big Datasets, Fragmented Indexes Insurance Business Application Performance Troubleshooting Framework – Pattern 2 - Example Where All Applications Specific Application Specific Operation When All the time Peak Hours Periodically Off Peak Hours Pattern 2
  14. 14. www.outsystems.com 14 © 2012 outsystems – all rights reserved 3. Resolution o  Data Model and Query optimizations •  Add/remove and defragment indexes •  Split queries and remove expand inline parameters •  Force TOPs, avoid UNIONs o  Application logic improvements •  Timers re-scheduling (for day operations) •  Enforce refined searches (reduced dataset) and use flat tables for searches Insurance Business Application Performance Troubleshooting Framework – Pattern 2 - Example Where All Applications Specific Application Specific Operation When All the time Peak Hours Periodically Off Peak Hours Pattern 2
  15. 15. www.outsystems.com 15 © 2012 outsystems – all rights reserved Performance Troubleshooting Framework – Pattern 3 Specific Application Specific Operation Allthetime PeakHours Possible Causes: •  Application/Operation data model, integration or architecture bottleneck (bad design) •  IIS Worker Process recycle (.NET stack) Strategy: •  Check Service Center reports Slow SQL / Screens / Extensions / Web References •  Check Windows Event Viewer on FEs for IIS messages •  Review application/operation implementation Where All Applications Specific Application Specific Operation When All the time Peak Hours Periodically Off Peak Hours Pattern 3
  16. 16. www.outsystems.com 16 © 2012 outsystems – all rights reserved Performance Troubleshooting Framework – Pattern 4 All Applications Specific Application Specific Operation Periodically Possible Causes: •  Timers (asynchronous processing) •  IIS Worker Process recycle (.NET stack) •  Application/Operation data model, integration or architecture bottleneck (bad design) Strategy: •  Correlate Timer and Screen logs for that period •  Check Service Center reports for that period Slow Timers / Screens / Extensions / Web References •  Check Windows Event Viewer on FEs for IIS messages •  Review application/operation implementation Where All Applications Specific Application Specific Operation When All the time Peak Hours Periodically Off Peak Hours Pattern 4
  17. 17. www.outsystems.com 17 © 2012 outsystems – all rights reserved Performance Troubleshooting Framework – Pattern 5 All Applications Specific Application Specific Operation OffPeakHours Possible Causes: •  Maintenance Tasks (DB, Antivirus) •  Timers (asynchronous processing) •  Application/Operation data model, integration or architecture bottleneck (bad design) Strategy: •  Check DB maintenance tasks history •  Check server’s scheduled tasks and antivirus configurations •  Check Service Center reports Slow Timers / Screens / Extensions / Web References •  Review application/operation implementation Where All Applications Specific Application Specific Operation When All the time Peak Hours Periodically Off Peak Hours Pattern 5
  18. 18. www.outsystems.com 18 © 2012 outsystems – all rights reserved Batch processing with long timer runs Critical Operation 800 GB Database Energy Billing System Performance Troubleshooting Framework – Pattern 5 - Example Where All Applications Specific Application Specific Operation When All the time Peak Hours Periodically Off Peak Hours Pattern 5
  19. 19. www.outsystems.com 19 © 2012 outsystems – all rights reserved 1. Symptoms Where: Low performance/timeout on specific operation When: Night/Off Peak Hours 2. Diagnosis o  Slow Timers and Slow SQL reports o  DB Maintenance Tasks taking 15 hours o  Timer execution colliding with DB maintenance tasks o  Timer performance degraded with data growth Energy Billing System Performance Troubleshooting Framework – Pattern 5 - Example Where All Applications Specific Application Specific Operation When All the time Peak Hours Periodically Off Peak Hours Pattern 5
  20. 20. www.outsystems.com 20 © 2012 outsystems – all rights reserved 3. Resolution o  Contention Measure: Increase timer timeouts o  Optimize DB maintenance tasks •  Reorganize vs rebuild indexes o  Reduce data set to be processed •  Split batches, reorganize data model •  Archive old data Energy Billing System Performance Troubleshooting Framework – Pattern 5 - Example Where All Applications Specific Application Specific Operation When All the time Peak Hours Periodically Off Peak Hours Pattern 5
  21. 21. www.outsystems.com 21 © 2012 outsystems – all rights reserved Performance Troubleshooting The Framework Where All Applications Specific Application Specific Operation When All the time Database Application Design IIS Worker Processes Application Design Peak Hours Database IIS Worker Processes Application Design IIS Worker Processes Application Design Periodically Timers IIS Worker Processes Timers IIS Worker Processes Integrations Timers Application Design Off Peak Hours Timers Maintenance Tasks Timers Maintenance Tasks Timers Maintenance Tasks Application Design
  22. 22. www.outsystems.com 22 © 2012 outsystems – all rights reserved How to gather performance data Performance Data Tools
  23. 23. www.outsystems.com 23 © 2012 outsystems – all rights reserved Application Application Server Infrastructure 3 layers to gather performance metrics Performance Data Tools
  24. 24. www.outsystems.com 24 © 2012 outsystems – all rights reserved Performance Data Tools Infrastructure layer (.NET stack) Use Windows Performance Counters Start menu > Control Panel > Administrative Tools > Performance Monitor
  25. 25. www.outsystems.com 25 © 2012 outsystems – all rights reserved Performance Data Tools Infrastructure layer (.NET stack) Keep counter values below the thresholds Performance Counter Threshold Processor(_Total)% Processor Time Depends on the server role: FE < 40% DB < 60% MemoryPages/sec < 1000 at all times PhysicalDiskAvg. Disk Queue Length < 2 for each physical disk drive TCPv4Connections Established < (100 * #worker processes + 50) * 2 < 3900
  26. 26. www.outsystems.com 26 © 2012 outsystems – all rights reserved agileplatform environment Performance Data Tools Infrastructure layer (.NET stack) Client Frontend 1 Database Frontend 2 External Systems Load Balancer CPU RAM DISK NETWORK
  27. 27. www.outsystems.com 27 © 2012 outsystems – all rights reserved Performance Data Tools Application Server layer (.NET stack) Use Windows Event Viewer to check for IIS events Start menu > Control Panel > Administrative Tools > Event Viewer
  28. 28. www.outsystems.com 28 © 2012 outsystems – all rights reserved Performance Data Tools Application Server layer (.NET stack) Make sure IIS Application Pools are properly configured Follow “Tuning and Security Check list” on Agile Platform .NET Install Checklist Event Threshold IIS Worker Process recycle Recycling should only occur when scheduled and off hours
  29. 29. www.outsystems.com 29 © 2012 outsystems – all rights reserved agileplatform environment Performance Data Tools Application Server layer (.NET stack) Client Frontend 1 Database Frontend 2 External Systems Load Balancer CPU RAM DISK NETWORK IIS WP RECYCLES
  30. 30. www.outsystems.com 30 © 2012 outsystems – all rights reserved Performance Data Tools Application layer Use Agile Platform’s Service Center reports Service Center > Analytics > Reports
  31. 31. www.outsystems.com 31 © 2012 outsystems – all rights reserved Performance Data Tools Application layer Service Center Report Threshold Slow SQL <100 occurrences with 500ms of average duration Slow Screen <100 occurrences with +1s of average duration Slow Web Service <100 occurrences with +1s of average duration Slow Web Reference <100 occurrences with +1s of average duration Slow Extension <100 occurrences with +1s of average duration Slow Timer Depends on the business logic
  32. 32. www.outsystems.com 32 © 2012 outsystems – all rights reserved agileplatform environment Performance Data Tools Application layer Client Frontend 1 Database Frontend 2 External Systems Load Balancer SLOW SCREEN SLOW SCREEN SLOW SQL SLOW SQL SLOW EXTENSION SLOW EXTENSION SLOW WEB REFERENCE SLOW WEB REFERENCE SLOW WEB SERVICE SLOW WEB SERVICE SLOW TIMER SLOW TIMER CPU RAM DISK NETWORK IIS WP RECYCLES
  33. 33. www.outsystems.com 33 © 2012 outsystems – all rights reserved How to prevent performance emergencies Now what?
  34. 34. www.outsystems.com 34 © 2012 outsystems – all rights reserved Performance Monitoring Goals ü  Maintain good performance levels ü  Know your apps/installation expected behavior ü  Identify new patterns and trends ü  No surprises!
  35. 35. www.outsystems.com 35 © 2012 outsystems – all rights reserved Agile Platform (Application) Infrastructure 2 Layer Monitoring Performance Monitoring
  36. 36. www.outsystems.com 36 © 2012 outsystems – all rights reserved Performance Monitoring Infrastructure Setup monitoring on DB and FE servers •  CPU, Memory, Disk, Network •  Windows Services status IIS OutSystems Services •  Database indicators Size Average Lock Wait Index Fragmentation
  37. 37. www.outsystems.com 37 © 2012 outsystems – all rights reserved Performance Monitoring Infrastructure Define thresholds and alarms •  Start with recommended thresholds •  Adapt to your requirements Use tools already available on IT •  e.g. Tivoli, OpManager, Nagios •  Windows Performance Monitor and Event Viewer
  38. 38. www.outsystems.com 38 © 2012 outsystems – all rights reserved Performance Monitoring Agile Platform Collect daily Service Center reports •  Slow SQL •  Slow Screens •  Daily History - Screen Hits, Daily Users Service Center > Analytics > Daily History Automatically generated by the platform (if active on Server Configuration)
  39. 39. www.outsystems.com 39 © 2012 outsystems – all rights reserved Performance Monitoring Agile Platform Check Error Log daily for timeouts •  May indicate performance problems Increase Log Cycle Period •  Configuration Tool > Logs tab (default is 4 weeks)
  40. 40. www.outsystems.com 40 © 2012 outsystems – all rights reserved Performance Monitoring A framework 1st Collect 2nd Analyze 3rd Implement
  41. 41. www.outsystems.com 41 © 2012 outsystems – all rights reserved Performance Monitoring Phase 1 - Collect Gather metrics in one place e.g. Excel Workbook Period depends on criticality 1 day vs. 1 week Register daily events to aid in analysis •  Know what happened and when E.g. scheduled maintenance, external downtimes •  Correlate with performance data Make sure to reserve budget for these tasks •  It must be followed through! •  E.g. 1 hour daily to collect and analyze 1st Collect 2nd Analyze 3rd Implement
  42. 42. www.outsystems.com 42 © 2012 outsystems – all rights reserved Performance Monitoring Phase 2 - Analyze Focus on “Top 10” most relevant •  SQL, Screens, Extensions, Web Services •  By usage or criticality Build visualizations (graphs) •  Better identification of trends •  Easier to analyze and spot deviations 1st Collect 2nd Analyze 3rd Implement
  43. 43. www.outsystems.com 43 © 2012 outsystems – all rights reserved Performance Monitoring Phase 2 - Analyze 1st Collect 2nd Analyze 3rd Implement
  44. 44. www.outsystems.com 44 © 2012 outsystems – all rights reserved Performance Monitoring Phase 3 - Implement Pick “Top X” to address on each sprint •  Fix them when they are small! •  Prioritize increasing trends Do not postpone! •  Make it a compromise to implement some improvements every sprint •  Keeps focus on performance •  Positive impact on users 1st Collect 2nd Analyze 3rd Implement
  45. 45. www.outsystems.com 45 © 2012 outsystems – all rights reserved Thank you! Paulo Cunha paulo.cunha@outsystems.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×