F5 Networks devices monitoring system. Can collect all metrics and can apply functions on them and correlate function applied values to pinpoint any problem in very fast without drill down deep.
2. USED TEST TOOLS TO GENERATE TRAFFIC AND TEST METRICS AND CONFIRM WORKING TYPES
- Apache bench
- Hping3
- Openssl
- Httperf
- Curl
- Vegata
- Tcpreplay
Note: All preparation done on F5 Virtual Appliance so hardware related metrics are not available.
But covers all since tested and implemented on hardware systems to (Reference Work eBay Inc.)
Key Important
- Whoever wants to deploy must get training on both subjects usage and key metrics and meanings of F5 metrics by us.
- Any metric that what it does can be shown with traffic generation regarding the metric live with test tools traffic simulation
or with tcpreplay method.
Example : SSLV3 request cause no handshake metric raise because system doesnot accept SSLV3 by default on higher versions.
So SSL Handshake failure counter will raise.
3. SNMP
SFLOW
SSH
Active/Passive via Floating IP
Activa/Active via Self Ip or Management Ip
Seralto F5 Monitoring Server
(On Premise or onCloud)
F5 Physicial Hardwares
F5 Vm Appliances
BASIC DEPLOYMENT EXAMPLE
Note: Servers can be deployed as cluster for HA.
If required.
5. GLOBAL METRICS MAIN DASHBOARD
This I icon mouse over
Brings a popup explain all
At that graphs metrics
Can choose between
devices or clusters
Interfaces
Vlans
6. Below are the detail topics we collect and store as Metrics
GLOBAL DETAILGlobal Values & States
- Arp Max Entries
- Arp Timeout
- Maintenance Mode Enable/Disable
- Packet Filter Enable/Disable
- Watchdog Service Enable/Disable
Global Memory Related
- System Global Mem & Usage
- TMM Dedicated Mem & Usage
- Global Swap Usage
Global Cpu & Erros / Drops
- Global Cpu Usage Ratio
- Connection Errors related memory
- Global Dropped Packets
- Incoming Packet Errors
- Outgoing Packet Errors
- Rejects because of vserver limit
- Could not processed connections from vserver,nat,snat
- Drops related License Limitation
- Memory Related Ip Errors
- Ip related Routing Errors
- Protocol Related Disacards
- Gateway or Next Hop unreachable
Main Global Metrics
Http Global Metrics
- Http Request / Second
- Http 2xx Response / Second
- Http 3xx Response / Second
- Http 4xx Response / Second
- Http 5xx Response / Second
- Http Get Requests
- Http Post Requests
- Http v0.9 Requests
- Http v1.0 Requests
- Http v1.1 Requests
7. Global Interface & VLAN Metrics
- Incoming PPS
- Outgoing PPS
- Incoming Bandiwidth
- Outgoing Bandwidth
- Incoming Errors
- Outgoing Errors
- Incoming Drops
- Outgoing Drops
Options To Choose Between Interfaces Options To Choose Between Vlans
8. Global Traffic Metrics
Global Client Side Traffic Metrics
(Non Hardware Accelerated Packets)
- Client Side Incoming PPS
- Client Side Outgoing PPS
- Client Side Incoming Bandwidth
- Client Side Outgoing Bandwidth
- Client Side Maximum Connection
(Hardware Accelerated)
- Client Side Incoming PPS
- Client Side Outgoing PPS
- Client Side Incoming Bandwidth
- Client Side Outgoing Bandwidth
- Client Side Maximum Connection
Global Client Side Traffic Metrics
(Non Hardware Accelerated Packets)
- Server Side Incoming PPS
- Server Side Outgoing PPS
- Server Side Incoming Bandwidth
- Server Side Outgoing Bandwidth
- Server Side Maximum Connection
(Hardware Accelerated)
- Server Side Incoming PPS
- Server Side Outgoing PPS
- Server Side Incoming Bandwidth
- Server Side Outgoing Bandwidth
- Server Side Maximum Connection
9. Global Client Side SSL Metrics
Global Client Side Ssl Connection, Tps, Encryption
and Decryption Traffic & SSL TYPE ERRORS
- SSL Current Connections
- SSL Full hardware Accelerated TPS
- SSL Partial hardware Accelerated TPS
- SSL No hardware Accelerated TPS
- SSL Encrypted Incoming Traffic (Bandwidth)
- SSL Encrypted Outgoing Traffic (Bandwidth)
- SSL Depcrypted Incoming Traffic (Bandwidth)
- SSL Depcrypted Outgoing Traffic (Bandwidth)
- SSL Premature Disconnects Counter
- SSL Handshake Failures Counter
- SSL Fatal Alerts Counter
- Port 443 Connection but no SSL Proto
- Client not Support SNI Rejected
Global Client Side SSL Tps vs TLS Versions
& Encryption methods / Key Exchange Methods
- TPS vs TLS Versions
- No Encryption Method
- AES CBC
- DES CBC
- OLD SSLV2 Cipher
- RC2 CBC
- RC4 CBC
- MD5 DIGEST
- SHA DIGEST
- AES GCM
- CAMELLIA CBC
- Anonymous Diff Hellman Key Exchange
- The difie-Hellman with RSA Certificate Key Exchange
- Ephemeral Diffie-Hellman w/ RSA cert. Key Exchange
- Ephemeral ECDH w/ RSA cert. Key Exchange
- Fixed ECDH with RSA signed cert. Key Exchange
- Ephemeral ECDH with ECDSA cert. Key Exchange
- Fixed ECDH with ECDSA cert. Key Exchange
- Ephemeral DH with DSS cert. Key Exchange
Note: Test example - TLSv1.2,ECDHE-RSA-AES128-GCM-SHA256,2048,128
10. Global Server Side SSL Metrics
Global Server Side Ssl Connection, Tps, Encryption
and Decryption Traffic & SSL TYPE ERRORS
- SSL Current Connections
- SSL Full hardware Accelerated TPS
- SSL Partial hardware Accelerated TPS
- SSL No hardware Accelerated TPS
- SSL Encrypted Incoming Traffic (Bandwidth)
- SSL Encrypted Outgoing Traffic (Bandwidth)
- SSL Depcrypted Incoming Traffic (Bandwidth)
- SSL Depcrypted Outgoing Traffic (Bandwidth)
- SSL Premature Disconnects Counter
- SSL Handshake Failures Counter
- SSL Fatal Alerts Counter
- Port 443 Connection but no SSL Proto
- Client not Support SNI Rejected
Global Server Side SSL Tps vs TLS Versions
& Encryption methods / Key Exchange Methods
- TPS vs TLS Versions
- No Encryption Method
- AES CBC
- DES CBC
- OLD SSLV2 Cipher
- RC2 CBC
- RC4 CBC
- MD5 DIGEST
- SHA DIGEST
- AES GCM
- CAMELLIA CBC
- Anonymous Diff Hellman Key Exchange
- The difie-Hellman with RSA Certificate Key Exchange
- Ephemeral Diffie-Hellman w/ RSA cert. Key Exchange
- Ephemeral ECDH w/ RSA cert. Key Exchange
- Fixed ECDH with RSA signed cert. Key Exchange
- Ephemeral ECDH with ECDSA cert. Key Exchange
- Fixed ECDH with ECDSA cert. Key Exchange
- Ephemeral DH with DSS cert. Key Exchange
11. Global Tcp / Udp Metrics
Global Tcp Metrics
- Open Tcp Connections
- Accepted Tcp Connections
- Established Tcp Connections
- Idle Timeout Expired Connections
- Received Syn Cookie
- Retransmit Segment
- Tcp Received Reset
- Tcp Close Wait
- Tcp Time Wait
- Tcp Fin Wait
- Tcp Fin Wait2
- Tcp Accepted Failed Connections
- Tcp Failed Connections
- Tcp Bad Checksum
- Tcp Out of Order Segment
- Bad Syn Cookie
- Syn Cache Overflow
Global Udp Metrics
- Udp Open Connections
- Udp Accepted Connections
- Udp Established Connections
- Udp Failed Accepted Connections
- Udp Failed Established Connections
- Udp Rx Dgram
- Udp Tx Dgram
- Udp Expires
- Udp Bad Dgram
- Udp Unreachable
- Udp Bad Sum
- Udp No Sum
12. Global Http Compression & WebAcceleration Metrics
Global Http Compression Metrics
- Http Uncompressed Data
- Http Compressed Data
- Compression Ratio
Http Bandwidth Saving comparison vs Client&Serverside Bandwidth
- Non Compressed Http data as Bandwidth
- Compressed Http Data as Bandwidth
- Client Side Tx Bandwidth
- Server Side Tx Bandwidth
A Very good example, that users can not
get a result as this with native approaches
Http uncompressed metric taken in to account
As persecond() and scale(8) applied to show
As bandwidth per second since this only record as bytes and sum up.
It is compared to Clientside sending and server side receiving traffic to
show comparisons and gain accurately from http compression.
Note: For not to mix.
- Client Side Rx : External to F5
- Client Side Tx : F5 to External
- Server Side Rx: F5 to Servers
- Server Side Tx: Servers to F5
Global Web Acceleration Metrics
- Web Cache Hits
- Web Cache Misses
- Web Cache Hit Bytes
- Web Cache Miss Bytes
- Web Cache Item Count
- Web Cache evictions
15. Node Metrics
Non Hardware Accelerated Node Metrics
- Node based incoming pps
- Node based outgoing pps
- Node based incoming bandwidth
- Node based outgoing bandwidth
- Node based Max Connection value to date
- Node based Currenct Connection
- Node based Current Session
- Node Based Current Requests
Hardware Accelerated Node Metrics
- Node based incoming pps
- Node based outgoing pps
- Node based incoming bandwidth
- Node based outgoing bandwidth
- Node based Max Connection value to date
- Node based Currenct Connection
- Node based Current Session
- Node Based Current Requests
16. Pool Metrics
Non Hardware Accelerated Pool Metrics
- Pool based incoming pps
- Pool based outgoing pps
- Pool based incoming bandwidth
- Pool based outgoing bandwidth
- Pool based Max Connection value to date
- Pool based Currenct Connection
- Pool based Current Session
- Pool Based Current Requests
Hardware Accelerated Pool Metrics
- Pool based incoming pps
- Pool based outgoing pps
- Pool based incoming bandwidth
- Pool based outgoing bandwidth
- Pool based Max Connection value to date
- Pool based Currenct Connection
- Pool based Current Session
- Pool Based Current Requests
17. Vserver Metrics
Non Hardware Accelerated Vserver Metrics
- Vserver based incoming pps
- Vserver based outgoing pps
- Vserver based incoming bandwidth
- Vserver based outgoing bandwidth
- Vserver based Currenct Connection
- Vserver based Current Requests
Hardware Accelerated Vserver Metrics
- Vserver based incoming pps
- Vserver based outgoing pps
- Vserver based incoming bandwidth
- Vserver based outgoing bandwidth
- Vserver based Currenct Connection
- Vserver based Hardware Syn Cookies
- Vserver based Hardware Syn Cookie Accepts
Vserver Detail Metrics
- Vserver Connection Durations (Min/Med/Max)
- Software Syn Cookies
- Software Syn Cookie Accepts
- Evicted Connections
- No node is up binded vserver error counter
- Slow Killed Connections
18. Profile Metrics
Note: Each Virtual servers profile has to be unique for to have fully segmantated metrics. So X virtual server has for
example http_x profile at http profiles. Virtual server y has http_y profile. Profile features does not matter to change or
not but for fully segmentation and clear view of each Vservers profile metrics to visible This profile segmentation must
be done. This way everything can be drilldown to deep detail to find root cause extremly quick time and with
1 or 2 graphs will enough to say problem is here exact.!
19. Profile Metrics Example
Http Profile Metrics
Note : We use one profile example to show profile metrics detail regarding binded vservers detail. Since this profile is unique and binded only
1 vserver we all see that vserver deep detail. WE can set profile and vserver name exact same makes easier to corelate with So this way we
can corelate per vserver profile unique metrics with global ones. This gives ability To find any issue related to what vserver and which
pool/nodes etc.
Http Profile Metrics
- Profile based Http Req/Sec
- Profile based Http Get Req/Sec
- Profile based Http Post Req/Sec
- Profile based Http Cookie Insert/Sec
Http Profile Metrics
- Profile based 2xx Res/Sec
- Profile based 3xx Res/Sec
- Profile based 4xx Res/Sec
- Profile based 5xx Res/Sec
Http Profile Metrics
- Profile based Http Responsed between 0-1 Kbyte
- Profile based Http Responsed between 1-4 Kbyte
- Profile based Http Responsed between 4-16 Kbyte
- Profile based Http Responsed between 16-32 Kbyte
21. Irule Metrics Example
- Irule Failed Executions
- Irule Aborted Executions
- Irule Total Executions
- Irule Execution Min Cpu Cycle
- Irule Execution Average Cpu Cycle
- Irule Execution Max Cpu Cycle
23. Cpu Cycles TMM vs Vservers Per Second
Use Case
We can closely follow tmm, vserver and irule cpu cycles all together.
Device has ability to update cpu cycles on 3 topics as tmm global, vserver and irule inside tmm.
So with those values at least we will have idea what used the most of cpu usage generation
And we can call the rest as others tmm operations. Any cpu spike will generate from irule and vserver
Can be detected.
24. Interface or Vlan Rx / Tx PPS vs Highest RX / TX PPS Using Vserver -- (Realtime)
Note: Can be compared pools/nodes also etc. Can be set highest 3 usage or as % ratio vs etc.
Can be set any method all need is your requirement.
Interface or Vlan PPS vs Vservers Top one PPS Realtime
Use Case
On this example use case. We compare overall vlan pps rx&tx with highest pps user vserver which can change
Realtime as values change. This way incase of network related changes and such a change can be detected ride away
As below shows left graphs test_ssl vserver was used most pps with vlan ppses on right one common_test vserver is
The highest one. So imagine like you have 300 vservers and want to follow these as which one is highest and what values
To corelate overall, ratio, changes etc. All can be done easily. This gives alot of power to find problems real fast.
Note: This can be apply between any thing needed. Like can corelate between which vserver generates
Higest 5xx http responses, http request receiving vserver to corelate devices global http request receiving etc.
Top PPS Usage Vserver
25. A Very good example, that users can not
get a result as this with native approaches
Http uncompressed metric taken in to account
As persecond() and scale(8) applied to show
As bandwidth per second since this only record as bytes and sum up.
It is compared to Clientside sending and server side receiving traffic to
show comparisons and gain accurately from http compression.
Note: For not to mix.
- Client Side Rx : External to F5
- Client Side Tx : F5 to External
- Server Side Rx: F5 to Servers
- Server Side Tx: Servers to F5
Actual Bandwidth Vs Http Compression Bandwidth Savings
Use Case
F5 snmp or sflow doesnot give ability to calculations or applying of functions on values. Only publish values via snmp
Or sflow. We prepare example calculations which can be done much more variations with your needs.
On this example we see that http compression is calculated per second as bandwidth. F5 only declares values as
total and bytes for overall so can only see as growth not as per second value to see bandwidth wise performance.
26. This Cpu cycles monitor gives you ability to check
How much irule spends overall cpu cycles
And also ability to see if you polish the irule
For better system resource usage. Can corelate with tmm or overall cpu usage.
An example for Irules ;
With this comparison Can compare between Irules of CPU spending and can test After optimization done the changes. So can
write and compare which rules Spend most cpu cycle with highestcurrent() or highestmax() functions;
So after you find out which irules usage has most cpu intensive you can focus on those and optimize those and can compare
Afterwards the results and can check between overall usage on tmm cpu cycles, vserver and irules to corelate.
IRULE USE CASE
27. • As long as we can collect values
• We can apply and find a way of checks on metrics.
• We can apply functions given to metrics for calculations as needed.
• Since F5 also has a ssh/shell, metrics that can not attain with snmp or sflow
• We can collect with scripts and write as metrics so we have alot of options.
28. For Future Plan on Next version(s)
• Sflow integration to collect metrics on per client ip based to see all traffic details. For example a client send most http request overall on vservers so can corelate all.
• We are going to implement this monitoring and anlytics methodology with twitters anomaly detection system or with HTM anomaly detection system.
Note: We work on both approaches to have a system with anomaly detection alarming rather then threshold alarming.
Twitters anomaly Detection Pattern – OK Twitters anomaly Detection Pattern – FAIL
Graphs taken from anomaly.io