Application Performance
Management for Blackboard Learn
Danny Thomas
Noriaki Tatsumi
7/15/2014
Who We Are – Blackboard Performance Team
2
Who We Are – Blackboard Performance Team
Teams
• Program
• Server
• Database
• Frontend
Tools
• Monitoring
• APM
• Profile...
APMs at Blackboard
Production Support Development
4
Without a Tool You Are Running a Blackbox!
5
APM Objectives
6
• Monitoring for visibility
– Centralize
– Improve Dev and Ops communication
• Identify what constitutes ...
Keys to Success
7
• Choosing the right tool
• Deployment automation
• Alert policies
• Instrumentation
Keys to Success:
Choosing the Right Tool
8
Features
9
• Real user monitoring (RUM)
• Application and database monitoring and profiling
• Servers, network, and filer ...
Deployment: SaaS
10
Deployment: Self-hosting
11
Data Retention
• Objectives
– Load/hardware forecast
– Business insights via data exploration
• Data types
– Time-series m...
Extension Framework
• Custom metrics
– https://github.com/ntatsumi/newrelic-postgresql
– https://github.com/ntatsumi/appdy...
Keys to Success:
Deployment Automation
14
Deployment Automation
15
Keys to Success:
Constructing Alert Policies
16
Alert Policies – Design Considerations
• Minimize noise and false positives
• Use thresholds (e.g. >90% for 3 minutes)
• U...
Alert Policies - Rule Conditions
• Application: Downtime, errors, application resource metrics,
Apdex score
• Server: Down...
Alert Policies - Apdex
• Industry standard way to measure users' perceptions of satisfactory
application responsiveness.
•...
Keys to Success:
Instrumentation
20
Instrumentation Entry Points
Web
• HTTP requests
• Request URI,
parameters
Non-Web
• Scheduled tasks
• Background
threads
...
Common Instrumentation
• Once an entry point is reached, default instrumentations
typically include:
– Servlets (Filters, ...
Custom Instrumentation
• Depending on the APM, will vary from custom entry points, to a
more flexible, but complex sensor ...
Real User Monitoring (RUM)
• Real-user monitoring inserts JavaScript snippets into pages
• Allows the APM tool to measure ...
System Monitoring
• Some tools may have no support for system level statistics, as
they’re application focused
• If not av...
Demonstration – New Relic
26
Best Practices
27
Deployment
• Start slowly:
– APM can introduce performance side effects (typically ~5%, could be
much higher if misconfigu...
Sizing/Scaling
• Oversizing application resources can be as harmful as
undersizing
• Most of interest
– Tomcat executor th...
Troubleshooting Issues
• Compare with your baseline
• Trust the data
• Use APM as a starting point; dig deeper into suspec...
Q&A
31
Upcoming SlideShare
Loading in …5
×

Application Performance Management

1,122 views

Published on

BbWorld 2014 - Application Performance Management for Blackboard Learn

Published in: Software

Application Performance Management

  1. 1. Application Performance Management for Blackboard Learn Danny Thomas Noriaki Tatsumi 7/15/2014
  2. 2. Who We Are – Blackboard Performance Team 2
  3. 3. Who We Are – Blackboard Performance Team Teams • Program • Server • Database • Frontend Tools • Monitoring • APM • Profiler • HTTP load generator • HTTP replay • Micro-benchmark • Performance CI Development Recent highlights: • B2 framework stabilization • Frames elimination • Server concurrency optimizations • New Relic instrumentation 3
  4. 4. APMs at Blackboard Production Support Development 4
  5. 5. Without a Tool You Are Running a Blackbox! 5
  6. 6. APM Objectives 6 • Monitoring for visibility – Centralize – Improve Dev and Ops communication • Identify what constitutes performance issues – Abnormal behaviors – Anti-patterns • Detect and diagnose root cause quickly • Translate into end user experience
  7. 7. Keys to Success 7 • Choosing the right tool • Deployment automation • Alert policies • Instrumentation
  8. 8. Keys to Success: Choosing the Right Tool 8
  9. 9. Features 9 • Real user monitoring (RUM) • Application and database monitoring and profiling • Servers, network, and filer monitoring • Application runtime architecture discovery • Transaction tracing • Alert policies • Reports - SLA, error tracking, custom • Extension and customization framework
  10. 10. Deployment: SaaS 10
  11. 11. Deployment: Self-hosting 11
  12. 12. Data Retention • Objectives – Load/hardware forecast – Business insights via data exploration • Data types – Time-series metric – Transaction traces – Slow SQL samples – Errors • Data format – Raw/sampled data – Aggregated data • Flexibility: Self-hosted vs. SaaS 12
  13. 13. Extension Framework • Custom metrics – https://github.com/ntatsumi/newrelic-postgresql – https://github.com/ntatsumi/appdynamics-blackboard-learn • Custom dashboards 13
  14. 14. Keys to Success: Deployment Automation 14
  15. 15. Deployment Automation 15
  16. 16. Keys to Success: Constructing Alert Policies 16
  17. 17. Alert Policies – Design Considerations • Minimize noise and false positives • Use thresholds (e.g. >90% for 3 minutes) • Use multiple data points (e.g. CPU + response times) • Use event types based on severity (e.g. warning, critical) • Send notifications that require action only • Test your alerts and notifications • Continuously tweak 17
  18. 18. Alert Policies - Rule Conditions • Application: Downtime, errors, application resource metrics, Apdex score • Server: Downtime, CPU usage, disk space, disk IO, memory usage • Key transactions: Errors, Apdex score 18
  19. 19. Alert Policies - Apdex • Industry standard way to measure users' perceptions of satisfactory application responsiveness. • Converts many measurements into one number on a uniform scale of 0-to-1 (0 = no users satisfied, 1 = all users satisfied) • Apdex Score = (Satisfied Count + Tolerating Count / 2) / Total Samples • Example: 100 samples with a target time of 3 seconds, where 60 are below 3 seconds, 30 are between 3 and 12 seconds, and the remaining 10 are above 12 seconds (60 + 30 / 2 )/ 100 = 0.75 http://en.wikipedia.org/wiki/Apdex 19
  20. 20. Keys to Success: Instrumentation 20
  21. 21. Instrumentation Entry Points Web • HTTP requests • Request URI, parameters Non-Web • Scheduled tasks • Background threads Event / Counter • Message Queuing • JMX • Application 21 • APM tools generally require an entry point to treat other activity as ‘interesting’:
  22. 22. Common Instrumentation • Once an entry point is reached, default instrumentations typically include: – Servlets (Filters, Requests) – Web frameworks (Spring, Struts, etc) – Database calls (JDBC) – Errors via logging frameworks and uncaught exceptions – External HTTP services 22
  23. 23. Custom Instrumentation • Depending on the APM, will vary from custom entry points, to a more flexible, but complex sensor approach • New Relic supports native API and XML based configurations – The April release of Learn ships with New Relic capabilities – Including instrumentation for: • Errors • Real-user monitoring • Scheduled (bb-task) and queued tasks • ‘Default’ servlet requests for static files – Additional XML based configuration, for features such as message queue handlers available from: https://github.com/blackboard/newrelic-blackboard-learn 23
  24. 24. Real User Monitoring (RUM) • Real-user monitoring inserts JavaScript snippets into pages • Allows the APM tool to measure end to end: – Web application contribution, as transactions are uniquely identified – Network time – DOM processing and page rendering time – JavaScript Errors – AJAX Requests • By browser • By location 24
  25. 25. System Monitoring • Some tools may have no support for system level statistics, as they’re application focused • If not available, application contribution in term of CPU usage, heap and native memory utilisation accounted for by JVM statistics • Provided by a separate daemon process 25
  26. 26. Demonstration – New Relic 26
  27. 27. Best Practices 27
  28. 28. Deployment • Start slowly: – APM can introduce performance side effects (typically ~5%, could be much higher if misconfigured) – Allow enough time to establish a baseline to compare changes against • Deploy end-to-end, avoid the temptation to instrument only some hosts • Follow APM vendor best practices 28
  29. 29. Sizing/Scaling • Oversizing application resources can be as harmful as undersizing • Most of interest – Tomcat executor threads – Connection pool sizing (available via JMX in April release, can be implied from executor usage) – Heap utilisation, Garbage Collection time 29
  30. 30. Troubleshooting Issues • Compare with your baseline • Trust the data • Use APM as a starting point; dig deeper into suspected components • Provide as much data as possible when reporting an issue (e.g. screenshots) 30
  31. 31. Q&A 31

×