Icinga Camp, Berlin 2019

SYSTEM Diagnostics
A Deeper Understanding
1… more than software© Würth Phoenix
Francesco Melchiori
Product Manager and Data Scientist
Würth Phoenix S.r.l.

 IT and Consulting Company of the Würth-Group
 Headquarter in Italy, European-wide presence, more than 160 highly skilled employees
 International experience in Business Software and IT Management
 Core competencies in trading processes, wholesale distribution and logistics
 Microsoft Gold Certified Partner, ITIL certified
2
ABOUT WÜRTH PHOENIX
Facts & figures
 More than 1.200 customers
worldwide
 Over 130 successfully
implemented ERP and CRM /
SFA projects
 400 NetEye customers
 HQ in Italy
We improve business productivity by
delivering world class software
solutions and a team of highly
motivated and skilled IT experts
© Würth Phoenix … more than software

3
It’s a long way to the system Diagnostics
… more than software© Würth Phoenix
Anomaly
Detection
Anomaly
Prediction
Anomaly
Diagnostics
real-time alert early alert root causes

4
dynamics AX network
RDCBroker
End user 1 AX RDS 1 AX AOS 1
SQL ServerAX RDS 2
AX RDS 3
AX AOS 2
AX AOS 3
End user 2
End user 3

5
NetEye ITOA
InfluxDB
Grafana
NATS
T elegraf
T
T
T
T
T
T T
customer network Würth Phoenix network

InfluxDB
Grafana
NATS
6
system Measurements: Remote Desktop Layer
▸ Processor
Percent_Processor_Time
Percent_Privileged_Time
▸ Memory
Available_Kbytes
Page_Faults_parsec
Pages_parsec
Pool_Nonpaged_Bytes
Pool_Paged_Byte
▸ Disk
Avg._Disk_sec/Read
Avg._Disk_sec/Write
Disk_Read_Bytes_persec
Disk_Reads_persec
Disk_Write_Bytes_persec
Disk_Writes_parsec
▸ Network
Bytes_Received_persec
Bytes_Sent_persec
Bytes_Total_persec
Current_Bandwidth
Output_Queue_Length
▸ System
Processor_Queue_Length

InfluxDB
Grafana
NATS
7
system Measurements: Application Object Server Layer
▸ Processor
▸ Memory
Available_Kbytes
Pages_parsec
▸ System
Context_Switches_parsec
▸ Paging
Percent_Usage
▸ .NET
Allocated_Bytes_persec
Percent_Time_in_GC
▸ AOS
active_sessions
number_of_bytes_received_by_server
number_of_bytes_sent_by_server
number_of_server_requests
number_of_client_requests
number_of_client_requests_per_second
total_number_of_selects_on_cached_tables
total_number_of_deletes_from_data_cache
total_number_of_remove_oldest
total_number_of_hits
total_number_of_misses
total_number_of_clears
total_number_of_clears_by_aos

InfluxDB
Grafana
NATS
8
system Measurements: SQL Server Layer
▸ Processor
▸ Memory
Available_KBytes
▸ Disk
Avg._Disk_sec/Read
Avg._Disk_sec/Write
Disk_Read_Bytes_persec
Disk_Reads_persec
Disk_Write_Bytes_persec
Disk_Writes_parsec
▸ Buffer
Page_reads_persec
Page_writes_persec
Page_life_expectancy
Lazy_writes_persec
Readahead_pages_persec
▸ Transactions
Longest_Transaction_Running_Time
Version_Store_Size_(KB)
▸ SQL
Batch_Requests_persec
▸ Wait
Page_IO_latch_waits

NATS
T
T
T
T
T
T
T T
InfluxDB
Grafana
9
Measurement charts

NATS
WP
10
Python Data Science Stack
customer
InfluxDB
Grafana

11
MACHINE LEARNING Concept
unsupervised
learning
d1
d2
s1
s2 d1
d2
d1
d2
(s1, s2)
d1
d2
s1
s2 d1
d2
d1
d2
?
supervised
learning
features
time series
data
samples
classifiertraining
labels

12
MACHINE LEARNING WARS
• crawl social networks, scrap
political posts, train fake news,
fake pictures, even fake videos
and finally hack the democracy
of the country you dislike
• customize super annoying
advertisements to bother your
website guests
• design medical devices to
predict epileptic seizures and
heart strokes or to detect
cancer cells at an early stage
and finally save human lives
• code email spam filters
• classify Earth scans to track the
health of threatened habitats

13
Susi Anomaly detection Model
NATS
WPcustomer
InfluxDB
Grafana
• host-by-host data processing
• samples from last 2 weeks
• sampling period of 5’’
• unsupervised learning: Isolation Forest
• training phase every day
• runtime classification every 10’
• training data is not labelled
• training data has to contain outliers
• trained classifier is designed to detect outliers
• classifier output:
-.5 +.50 normalabnormal
“unknown end user performance of
lonely and sometimes problematic nodes”

14
Anomaly detection visualization
Moving mean
& value range
anomaly score
Historic data
difference

15
Anomaly detection visualization

16
NEXT Anomaly PREDICTION MODEL
NATS
WPcustomer
InfluxDB
Grafana
• host network data processing
• Principal Component Analysis extracts the most relevant
features (dimensionality reduction)
• supervised learning: Support Vector Machine
• training data is labeled
• labels for several class of known issues
• trained classifier is designed to predict anomalies
• classifier output:
“known end user performance of
connected and sometimes
problematic nodes”
normal in future issue A in future issue B in future
N A N
now future

17
labeling dilemma
runtime prediction
N A N
N
N
N
N
N
A
A
A
A
A
• training a classifier needs a number of labeled samples
• labeling task that is manually difficult to pursue
• how can I get labeled samples in an automatic way?
Visual Synthetic
Monitoring

18© Würth Phoenix 2018
Visual
Synthetic
Monitoring
Alyvix looks at graphic interfaces
Alyvix behaves like human users
Alyvix tracks transaction performances
alyvix

Alyvix provides GUI tools
to design any app
transaction
alyvix designer and editor

End user interaction flow
↓
List of transactions
↓
Test case
alyvix test case building

1. detects object
2. takes its time
3. interacts with it
1. detects object
2. takes its time
3. interacts with it
alyvix automation

Word (virtualized) unavailable
or
RDWebAccess
1. Check
AVAILABILITY
RDWebAccess
2. Measure
RESPONIVENESS Word (virtualized)
Word (virtualized)
Word (virtualized)
or
or
alyvix measurement

service downtimes
latency spikes
alyvix End User Experience monitoring
N
A A
N N N N N N
A A A

24
synthetic dynamics AX network
RDCBroker
AX RDS 1 AX AOS 1
SQL ServerAX RDS 2
AX RDS 3
AX AOS 2
AX AOS 3
Synth user 1
Synth user 2
Synth user 3

NATS
WP
25
NEXT Anomaly PREDICTION ARCHITECTURE
customer
InfluxDB
Grafana
early
alert

Thanks for your attention!

Icinga Camp, Berlin 2019

Recommended

Recommended

More Related Content

Similar to Icinga Camp, Berlin 2019

Similar to Icinga Camp, Berlin 2019 (20)

Recently uploaded

Recently uploaded (20)

Icinga Camp, Berlin 2019

Editor's Notes