In this webinar we look at how to effectively implement good monitoring practices or your servers and applications.
Recorded webinar: https://www.outsystems.com/learn/courses/29/webinar-effective-platform-server-monitoring/
Free Online training: https://www.outsystems.com/learn/courses/
Follow us on Twitter http://www.twitter.com/OutSystemsDev
Like us on Facebook http://www.Facebook.com/OutSystemsDev
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
Boost Server Performance with Infrastructure Monitor
1. Effective Platform Server Monitoring
Webinar / March 24th 2016
Boosting server performance with the Infrastructure Monitor tool and practices
2. Effective Platform Server Monitoring
Paulo Cunha
Platform Operations Team Leader
Expert Services
paulo.cunha@outsystems.com
https://www.linkedin.com/in/paulocunha
@paulofgccunha
2
3. Agenda
Effective Platform Server Monitoring
● What is monitoring?
● Why is monitoring so important?
● How to do?
○ 3 layers of monitoring
○ Multiple suites
● Key metrics when using OutSystems Platform
● Infrastructure Monitor
○ What is it?
○ How it works
○ Requirements
○ Demo
○ Troubleshooting
3
4. What is monitoring?
Effective Platform Server Monitoring
“to be aware of the state of a system, to observe a situation for any changes which may
occur over time, using a monitor or measuring device of some sort”
Wikipedia, The Free Encyclopedia, 24 Feb 2016
4
● Application Performance
● Business Process
● Functional
● Availability
● Errors
● Network
● Infrastructure
5. Why is monitoring so important?
Establish the performance baseline of your system
● No estimations or wishful thinking
● Real measure of service level
Know how your system behaves
● Identify common patterns
● Recognize trends
● Predict issues and scaling needs
Alerts you when attention is needed
● Be warned of potential issues allowing you to react
● Reduce MTTD (mean time to detect) and MTTR (mean time to resolve)
5Effective Platform Server Monitoring
6. How to do?
3 layers of monitoring
Applications
Services
Infrastructure
Application Performance Monitoring (APM)
● End User Experience
Services performance & availability
● Application and DB Servers
● OutSystems Platform Services
Server & network resources
● CPU
● Memory
● I/O
● Network
6
Developers
Operations
Effective Platform Server Monitoring
7. How to do?
Multiple suites
7Effective Platform Server Monitoring
Full-featured
● Cover 1 or more layers
● High number of metrics supported
● Complex dashboards
● Developer vs Operations oriented
● SaaS vs On-Premises deployment
● Licensing costs (some open source)
Mostly used for Production monitoring
● Most critical and to get real usage metrics
● Costs of licensing and operation
● Typically owned by Operations
Monitoring on Development is usually not considered!
● What if your 20 developers aren’t able to work?
8. Key Metrics when using OutSystems Platform
Effective Platform Server Monitoring
Typical Scenario
Transactional Web Applications
8
Transactions / second
Query cache usage
Server load
Availability
Requests / second
Errors
Server load
Availability
Requests / second
Server load
Availability
Response time
Availability
Request
Response
9. Key Metrics when using OutSystems Platform
Applications
Services
Infrastructure
Performance Index (APDEX)
● End User Experience
9Effective Platform Server Monitoring
Client
● Browser
● Operating System
Server
● Screen
● Action
● Query / Integration
Network
● Latency
OutSystems Performance Monitor
10. Key Metrics when using OutSystems Platform
Applications
Services
Infrastructure
Application Servers
● Requests / second
● Requests queued
● Memory consumption
IIS application pools
JBoss processes
● Process restarts and its causes
IIS application pool recycles
1
0
Platform Services
Integrations
Message Queues
● Availability checks
Effective Platform Server Monitoring
Database Servers
● Transactions / second
● Caches usage
Query and
data
# Recompiles of query
plans
● Waits and Locks
Beware of timeouts
11. Key Metrics when using OutSystems Platform
Applications
Services
Infrastructure
Server resources
● CPU usage and queue
● Memory usage
● Network usage and errors
● Disk usage and queue
1
1
Effective Platform Server Monitoring
Adds important context
● Resource consumption
● Scaling needs
● Faster alerting
Can be applied to all servers
13. Infrastructure Monitor
Effective Platform Server Monitoring
Simple and effective monitoring for the infrastructure layer
● No complex dashboards and metrics
● Surfaces key infrastructure metrics
● Establishes recommended thresholds
1
3
Integrated in the platform’s management console
● Same environments and servers
● Right next to Performance Monitor
● Bridge the gap between Developers and Operations
Email alerts
● Based on recommended thresholds and duration of events
Open source
● Get it from the Forge http://outsyste.ms/1U8O9h1
14. Infrastructure Monitor
1
4
How it works
Continuously gets metrics from the servers
● Every 30 seconds
● Direct requests to servers (no agents)
● Uses monitoring standards
WMI
SNMP (soon…)
Evaluates metric values against thresholds
and decides whether to alert
● Every 2 minutes
● If unsolved, alert is repeated after 12
hours
Effective Platform Server Monitoring
15. Infrastructure Monitor
● OutSystems Platform 9+
● LifeTime installed
Preferably in a dedicated environment
● SQL Server or Oracle database
● Windows/.NET stack
Linux/Java stack support to be released soon
1
5
Requirements
Effective Platform Server Monitoring
16. Infrastructure Monitor
Connectivity from LifeTime to server via TCP port 135
Active Directory user account to access WMI API
All servers must belong to same domain
Follow additional server configuration steps at
http://outsyste.ms/1lnELXb
1
6
Server Configuration Requirements
SNMP installed
TCP port 161
(soon…)
Effective Platform Server Monitoring
17. Infrastructure Monitor
1. Walkthrough
2. Activating a new environment
3. Environment details and recommended thresholds
1
7
Demo
Effective Platform Server Monitoring
18. Infrastructure Monitor
Iterative approach
1
8
Troubleshooting
Effective Platform Server Monitoring
Measure
Improve Analyze
1. Identify patterns in metrics
2. Correlate with other data (Platform Analytics, logs)
3. Apply corrective measures
○ Scaling or reconfiguration of server / services
○ Application fixes / improvements
19. Possible Measures
➔ Reschedule asynchronous and background
processes
➔ Isolate timers in another front-end
Infrastructure Monitor
Pattern
Sustained high CPU usage (around 80%)
1
9
Troubleshooting example
Throughout working hours
● Reaching CPU capacity
● Complex application logic
At specific periods of the day
● Asynchronous processes (timers)
● Anti-virus schedules
● Backups in the DB
Possible Measures
➔ Scale server vertically by adding CPU resources
➔ Scale horizontally by adding new server
➔ Refactor application logic
When When
Effective Platform Server Monitoring
20. Over a few days
● Usually related to application pool recycling due to
reaching maximum configured memory
Within a few hours/same day
● More severe - indication of not enough memory
Possible Measures
➔ Review application pool memory limit configurations
➔ Review application logic for high memory consumption patterns
➔ Increase total server memory
Infrastructure Monitor
Pattern
Memory usage shows a jigsaw pattern over time
2
0
Troubleshooting example
When
Effective Platform Server Monitoring
21. Recap
2
1
● Monitoring is crucial to measure, predict and improve
● 3 layers of monitoring
● Multiple suites that you can (and probably already) use
● Most relevant metrics to keep an eye on
● Infrastructure Monitor as an option within the Platform
○ Together with Performance Monitor (out of the box)
○ Get it from the Forge http://outsyste.ms/1U8O9h1
○ And help us evolve it!
Effective Platform Server Monitoring
Editor's Notes
Constantly assess the status of a system (at different levels) to determine current or potential issues that may lead to application unavailability or service deterioration (performance, unexpected errors. etc)
Top goals: Clearer View, Address, Predict and Antecipate
Production first
Criticality, business continuity
Important for DEVs also
Not only performance related
Integration not working
Unexpected errors
AppDynamics, DynaTrace, New Relic, Ruxit
Nagios, PRTG, Monitis
AWS, Azure
Tivoli
Zabbix
OpManager
?
Full featured suites (good and bad - maybe too many features)
Operations oriented (hard to read data for Devs or Business)
For larger networks / multiple technologies / hardware (router, switches)
Complex dashboards / too many metrics
Require corporate deployment (all servers)
Licensing costs per server (?)
Mostly used for Production due to criticality and costs
Perfectly fine if you’re using them on Production (and you should)
Transactional Web Applications
What kind of things need to be monitored
Database availability, load, transactions/sec
FE availability, load, requests/sec
1st layer - Server
CPU (below 80%)
CPU Queue
Memory
App server vs database
Network usage and errors
Disk Space
High CPU every night - check for timers
Memory changes - application pool recycles
High network received - big uploads
Network errors - relate to unexpected connection closed