SlideShare a Scribd company logo
SYSTEM Diagnostics
A Deeper Understanding
1… more than software© Würth Phoenix
Francesco Melchiori
Product Manager and Data Scientist
Würth Phoenix S.r.l.
 IT and Consulting Company of the Würth-Group
 Headquarter in Italy, European-wide presence, more than 160 highly skilled employees
 International experience in Business Software and IT Management
 Core competencies in trading processes, wholesale distribution and logistics
 Microsoft Gold Certified Partner, ITIL certified
2
ABOUT WÜRTH PHOENIX
Facts & figures
 More than 1.200 customers
worldwide
 Over 130 successfully
implemented ERP and CRM /
SFA projects
 400 NetEye customers
 HQ in Italy
We improve business productivity by
delivering world class software
solutions and a team of highly
motivated and skilled IT experts
© Würth Phoenix … more than software
3
It’s a long way to the system Diagnostics
… more than software© Würth Phoenix
Anomaly
Detection
Anomaly
Prediction
Anomaly
Diagnostics
real-time alert early alert root causes
4
dynamics AX network
… more than software© Würth Phoenix
RDCBroker
End user 1 AX RDS 1 AX AOS 1
SQL ServerAX RDS 2
AX RDS 3
AX AOS 2
AX AOS 3
End user 2
End user 3
5
NetEye ITOA
… more than software© Würth Phoenix
InfluxDB
Grafana
NATS
T elegraf
T
T
T
T
T
T T
customer network Würth Phoenix network
InfluxDB
Grafana
NATS
customer network Würth Phoenix network
6
system Measurements: Remote Desktop Layer
… more than software© Würth Phoenix
▸ Processor
Percent_Processor_Time
Percent_Privileged_Time
▸ Memory
Available_Kbytes
Page_Faults_parsec
Pages_parsec
Pool_Nonpaged_Bytes
Pool_Paged_Byte
▸ Disk
Avg._Disk_sec/Read
Avg._Disk_sec/Write
Disk_Read_Bytes_persec
Disk_Reads_persec
Disk_Write_Bytes_persec
Disk_Writes_parsec
▸ Network
Bytes_Received_persec
Bytes_Sent_persec
Bytes_Total_persec
Current_Bandwidth
Output_Queue_Length
▸ System
Processor_Queue_Length
InfluxDB
Grafana
NATS
customer network Würth Phoenix network
7
system Measurements: Application Object Server Layer
… more than software© Würth Phoenix
▸ Processor
Percent_Processor_Time
Percent_Privileged_Time
▸ Memory
Available_Kbytes
Pages_parsec
▸ System
Context_Switches_parsec
▸ Paging
Percent_Usage
▸ .NET
Allocated_Bytes_persec
Percent_Time_in_GC
▸ AOS
active_sessions
number_of_bytes_received_by_server
number_of_bytes_sent_by_server
number_of_server_requests
number_of_client_requests
number_of_client_requests_per_second
total_number_of_selects_on_cached_tables
total_number_of_deletes_from_data_cache
total_number_of_remove_oldest
total_number_of_hits
total_number_of_misses
total_number_of_clears
total_number_of_clears_by_aos
InfluxDB
Grafana
NATS
customer network Würth Phoenix network
8
system Measurements: SQL Server Layer
… more than software© Würth Phoenix
▸ Processor
Percent_Processor_Time
Percent_Privileged_Time
▸ Memory
Available_KBytes
▸ Disk
Avg._Disk_sec/Read
Avg._Disk_sec/Write
Disk_Read_Bytes_persec
Disk_Reads_persec
Disk_Write_Bytes_persec
Disk_Writes_parsec
▸ Buffer
Page_reads_persec
Page_writes_persec
Page_life_expectancy
Lazy_writes_persec
Readahead_pages_persec
▸ Transactions
Longest_Transaction_Running_Time
Version_Store_Size_(KB)
▸ SQL
Batch_Requests_persec
▸ Wait
Page_IO_latch_waits
NATS
customer network Würth Phoenix network
T
T
T
T
T
T
T T
InfluxDB
Grafana
9
Measurement charts
… more than software© Würth Phoenix
NATS
WP
10
Python Data Science Stack
… more than software© Würth Phoenix
customer
InfluxDB
Grafana
11
MACHINE LEARNING Concept
… more than software© Würth Phoenix
unsupervised
learning
d1
d2
s1
s2 d1
d2
d1
d2
(s1, s2)
d1
d2
s1
s2 d1
d2
d1
d2
?
supervised
learning
features
time series
data
samples
classifiertraining
labels
12
MACHINE LEARNING WARS
… more than software© Würth Phoenix
• crawl social networks, scrap
political posts, train fake news,
fake pictures, even fake videos
and finally hack the democracy
of the country you dislike
• customize super annoying
advertisements to bother your
website guests
• design medical devices to
predict epileptic seizures and
heart strokes or to detect
cancer cells at an early stage
and finally save human lives
• code email spam filters
• classify Earth scans to track the
health of threatened habitats
13
Susi Anomaly detection Model
… more than software© Würth Phoenix
NATS
WPcustomer
InfluxDB
Grafana
• host-by-host data processing
• samples from last 2 weeks
• sampling period of 5’’
• unsupervised learning: Isolation Forest
• training phase every day
• runtime classification every 10’
• training data is not labelled
• training data has to contain outliers
• trained classifier is designed to detect outliers
• classifier output:
-.5 +.50 normalabnormal
“unknown end user performance of
lonely and sometimes problematic nodes”
14
Anomaly detection visualization
© Würth Phoenix … more than software
Moving mean
& value range
anomaly score
Historic data
difference
15
Anomaly detection visualization
© Würth Phoenix … more than software
16
NEXT Anomaly PREDICTION MODEL
… more than software© Würth Phoenix
NATS
WPcustomer
InfluxDB
Grafana
• host network data processing
• Principal Component Analysis extracts the most relevant
features (dimensionality reduction)
• supervised learning: Support Vector Machine
• training data is labeled
• labels for several class of known issues
• trained classifier is designed to predict anomalies
• classifier output:
“known end user performance of
connected and sometimes
problematic nodes”
normal in future issue A in future issue B in future
N A N
now future
17
labeling dilemma
… more than software© Würth Phoenix
runtime prediction
N A N
N
N
N
N
N
A
A
A
A
A
• training a classifier needs a number of labeled samples
• labeling task that is manually difficult to pursue
• how can I get labeled samples in an automatic way?
Visual Synthetic
Monitoring
18© Würth Phoenix 2018
Visual
Synthetic
Monitoring
Alyvix looks at graphic interfaces
Alyvix behaves like human users
Alyvix tracks transaction performances
alyvix
19© Würth Phoenix 2018
Alyvix provides GUI tools
to design any app
transaction
alyvix designer and editor
20© Würth Phoenix 2018
End user interaction flow
↓
List of transactions
↓
Test case
alyvix test case building
21© Würth Phoenix 2018
1. detects object
2. takes its time
3. interacts with it
1. detects object
2. takes its time
3. interacts with it
alyvix automation
22© Würth Phoenix 2018
Word (virtualized) unavailable
or
RDWebAccess
1. Check
AVAILABILITY
RDWebAccess
2. Measure
RESPONIVENESS Word (virtualized)
Word (virtualized)
Word (virtualized)
or
or
alyvix measurement
23© Würth Phoenix 2018
service downtimes
latency spikes
alyvix End User Experience monitoring
N
A A
N N N N N N
A A A
24
synthetic dynamics AX network
… more than software© Würth Phoenix
RDCBroker
AX RDS 1 AX AOS 1
SQL ServerAX RDS 2
AX RDS 3
AX AOS 2
AX AOS 3
Synth user 1
Synth user 2
Synth user 3
NATS
WP
25
NEXT Anomaly PREDICTION ARCHITECTURE
… more than software© Würth Phoenix
customer
InfluxDB
Grafana
early
alert
26© Würth Phoenix 2018
Thanks for your attention!

More Related Content

Similar to Icinga Camp, Berlin 2019

network-management Web base.ppt
network-management Web base.pptnetwork-management Web base.ppt
network-management Web base.ppt
AssadLeo1
 
Brighttalk understanding the promise of sde - final
Brighttalk   understanding the promise of sde - finalBrighttalk   understanding the promise of sde - final
Brighttalk understanding the promise of sde - final
Andrew White
 

Similar to Icinga Camp, Berlin 2019 (20)

SFScon15 - Jürgen Vigna: " Application Performance Monitoring auf Open Source...
SFScon15 - Jürgen Vigna: " Application Performance Monitoring auf Open Source...SFScon15 - Jürgen Vigna: " Application Performance Monitoring auf Open Source...
SFScon15 - Jürgen Vigna: " Application Performance Monitoring auf Open Source...
 
Why Pay for Open Source Linux? Avoid the Hidden Cost of DIY
Why Pay for Open Source Linux? Avoid the Hidden Cost of DIYWhy Pay for Open Source Linux? Avoid the Hidden Cost of DIY
Why Pay for Open Source Linux? Avoid the Hidden Cost of DIY
 
[Café techno] VMworld Europe 2014 - Les annonces importantes (11/12/14)
[Café techno] VMworld Europe 2014 - Les annonces importantes (11/12/14)[Café techno] VMworld Europe 2014 - Les annonces importantes (11/12/14)
[Café techno] VMworld Europe 2014 - Les annonces importantes (11/12/14)
 
SFScon19 - Francesco Melchiori - Digital innovation through the lens of Alyvix
SFScon19 - Francesco Melchiori - Digital innovation through the lens of AlyvixSFScon19 - Francesco Melchiori - Digital innovation through the lens of Alyvix
SFScon19 - Francesco Melchiori - Digital innovation through the lens of Alyvix
 
OSMC 2019 | Tornado – Extend Icinga2 for Active and passive Monitoring of com...
OSMC 2019 | Tornado – Extend Icinga2 for Active and passive Monitoring of com...OSMC 2019 | Tornado – Extend Icinga2 for Active and passive Monitoring of com...
OSMC 2019 | Tornado – Extend Icinga2 for Active and passive Monitoring of com...
 
NetEye 4 based on Icinga 2 - Icinga Camp Milan 2019
NetEye 4 based on Icinga 2 - Icinga Camp Milan 2019NetEye 4 based on Icinga 2 - Icinga Camp Milan 2019
NetEye 4 based on Icinga 2 - Icinga Camp Milan 2019
 
Security in the DevOps pipeline of containerized core application: Case Study...
Security in the DevOps pipeline of containerized core application: Case Study...Security in the DevOps pipeline of containerized core application: Case Study...
Security in the DevOps pipeline of containerized core application: Case Study...
 
Cloud-native Patterns
Cloud-native PatternsCloud-native Patterns
Cloud-native Patterns
 
Cloud-native Patterns (July 4th, 2019)
Cloud-native Patterns (July 4th, 2019)Cloud-native Patterns (July 4th, 2019)
Cloud-native Patterns (July 4th, 2019)
 
PureApplication: Devops and Urbancode
PureApplication: Devops and UrbancodePureApplication: Devops and Urbancode
PureApplication: Devops and Urbancode
 
Future-Proof Your Desktops - How City of Kent is Implementing VDI for 800 Wor...
Future-Proof Your Desktops - How City of Kent is Implementing VDI for 800 Wor...Future-Proof Your Desktops - How City of Kent is Implementing VDI for 800 Wor...
Future-Proof Your Desktops - How City of Kent is Implementing VDI for 800 Wor...
 
Netpod - The Merging of NPM & APM
Netpod - The Merging of NPM & APMNetpod - The Merging of NPM & APM
Netpod - The Merging of NPM & APM
 
network-management Web base.ppt
network-management Web base.pptnetwork-management Web base.ppt
network-management Web base.ppt
 
Kurt Schneider [Discover Financial] | How Discover Modernizes Observability w...
Kurt Schneider [Discover Financial] | How Discover Modernizes Observability w...Kurt Schneider [Discover Financial] | How Discover Modernizes Observability w...
Kurt Schneider [Discover Financial] | How Discover Modernizes Observability w...
 
Realwear Overview Presentation 2019
Realwear Overview Presentation  2019Realwear Overview Presentation  2019
Realwear Overview Presentation 2019
 
1 App,
1 App, 1 App,
1 App,
 
KCD Munich - Cloud Native Platform Dilemma - Turning it into an Opportunity
KCD Munich - Cloud Native Platform Dilemma - Turning it into an OpportunityKCD Munich - Cloud Native Platform Dilemma - Turning it into an Opportunity
KCD Munich - Cloud Native Platform Dilemma - Turning it into an Opportunity
 
Splunk for xen_desktop
Splunk for xen_desktopSplunk for xen_desktop
Splunk for xen_desktop
 
Exploring Phantom Traffic Jams in Your Data Flows
Exploring Phantom Traffic Jams in Your Data Flows Exploring Phantom Traffic Jams in Your Data Flows
Exploring Phantom Traffic Jams in Your Data Flows
 
Brighttalk understanding the promise of sde - final
Brighttalk   understanding the promise of sde - finalBrighttalk   understanding the promise of sde - final
Brighttalk understanding the promise of sde - final
 

Recently uploaded

Recently uploaded (20)

Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdf
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 

Icinga Camp, Berlin 2019

  • 1. SYSTEM Diagnostics A Deeper Understanding 1… more than software© Würth Phoenix Francesco Melchiori Product Manager and Data Scientist Würth Phoenix S.r.l.
  • 2.  IT and Consulting Company of the Würth-Group  Headquarter in Italy, European-wide presence, more than 160 highly skilled employees  International experience in Business Software and IT Management  Core competencies in trading processes, wholesale distribution and logistics  Microsoft Gold Certified Partner, ITIL certified 2 ABOUT WÜRTH PHOENIX Facts & figures  More than 1.200 customers worldwide  Over 130 successfully implemented ERP and CRM / SFA projects  400 NetEye customers  HQ in Italy We improve business productivity by delivering world class software solutions and a team of highly motivated and skilled IT experts © Würth Phoenix … more than software
  • 3. 3 It’s a long way to the system Diagnostics … more than software© Würth Phoenix Anomaly Detection Anomaly Prediction Anomaly Diagnostics real-time alert early alert root causes
  • 4. 4 dynamics AX network … more than software© Würth Phoenix RDCBroker End user 1 AX RDS 1 AX AOS 1 SQL ServerAX RDS 2 AX RDS 3 AX AOS 2 AX AOS 3 End user 2 End user 3
  • 5. 5 NetEye ITOA … more than software© Würth Phoenix InfluxDB Grafana NATS T elegraf T T T T T T T customer network Würth Phoenix network
  • 6. InfluxDB Grafana NATS customer network Würth Phoenix network 6 system Measurements: Remote Desktop Layer … more than software© Würth Phoenix ▸ Processor Percent_Processor_Time Percent_Privileged_Time ▸ Memory Available_Kbytes Page_Faults_parsec Pages_parsec Pool_Nonpaged_Bytes Pool_Paged_Byte ▸ Disk Avg._Disk_sec/Read Avg._Disk_sec/Write Disk_Read_Bytes_persec Disk_Reads_persec Disk_Write_Bytes_persec Disk_Writes_parsec ▸ Network Bytes_Received_persec Bytes_Sent_persec Bytes_Total_persec Current_Bandwidth Output_Queue_Length ▸ System Processor_Queue_Length
  • 7. InfluxDB Grafana NATS customer network Würth Phoenix network 7 system Measurements: Application Object Server Layer … more than software© Würth Phoenix ▸ Processor Percent_Processor_Time Percent_Privileged_Time ▸ Memory Available_Kbytes Pages_parsec ▸ System Context_Switches_parsec ▸ Paging Percent_Usage ▸ .NET Allocated_Bytes_persec Percent_Time_in_GC ▸ AOS active_sessions number_of_bytes_received_by_server number_of_bytes_sent_by_server number_of_server_requests number_of_client_requests number_of_client_requests_per_second total_number_of_selects_on_cached_tables total_number_of_deletes_from_data_cache total_number_of_remove_oldest total_number_of_hits total_number_of_misses total_number_of_clears total_number_of_clears_by_aos
  • 8. InfluxDB Grafana NATS customer network Würth Phoenix network 8 system Measurements: SQL Server Layer … more than software© Würth Phoenix ▸ Processor Percent_Processor_Time Percent_Privileged_Time ▸ Memory Available_KBytes ▸ Disk Avg._Disk_sec/Read Avg._Disk_sec/Write Disk_Read_Bytes_persec Disk_Reads_persec Disk_Write_Bytes_persec Disk_Writes_parsec ▸ Buffer Page_reads_persec Page_writes_persec Page_life_expectancy Lazy_writes_persec Readahead_pages_persec ▸ Transactions Longest_Transaction_Running_Time Version_Store_Size_(KB) ▸ SQL Batch_Requests_persec ▸ Wait Page_IO_latch_waits
  • 9. NATS customer network Würth Phoenix network T T T T T T T T InfluxDB Grafana 9 Measurement charts … more than software© Würth Phoenix
  • 10. NATS WP 10 Python Data Science Stack … more than software© Würth Phoenix customer InfluxDB Grafana
  • 11. 11 MACHINE LEARNING Concept … more than software© Würth Phoenix unsupervised learning d1 d2 s1 s2 d1 d2 d1 d2 (s1, s2) d1 d2 s1 s2 d1 d2 d1 d2 ? supervised learning features time series data samples classifiertraining labels
  • 12. 12 MACHINE LEARNING WARS … more than software© Würth Phoenix • crawl social networks, scrap political posts, train fake news, fake pictures, even fake videos and finally hack the democracy of the country you dislike • customize super annoying advertisements to bother your website guests • design medical devices to predict epileptic seizures and heart strokes or to detect cancer cells at an early stage and finally save human lives • code email spam filters • classify Earth scans to track the health of threatened habitats
  • 13. 13 Susi Anomaly detection Model … more than software© Würth Phoenix NATS WPcustomer InfluxDB Grafana • host-by-host data processing • samples from last 2 weeks • sampling period of 5’’ • unsupervised learning: Isolation Forest • training phase every day • runtime classification every 10’ • training data is not labelled • training data has to contain outliers • trained classifier is designed to detect outliers • classifier output: -.5 +.50 normalabnormal “unknown end user performance of lonely and sometimes problematic nodes”
  • 14. 14 Anomaly detection visualization © Würth Phoenix … more than software Moving mean & value range anomaly score Historic data difference
  • 15. 15 Anomaly detection visualization © Würth Phoenix … more than software
  • 16. 16 NEXT Anomaly PREDICTION MODEL … more than software© Würth Phoenix NATS WPcustomer InfluxDB Grafana • host network data processing • Principal Component Analysis extracts the most relevant features (dimensionality reduction) • supervised learning: Support Vector Machine • training data is labeled • labels for several class of known issues • trained classifier is designed to predict anomalies • classifier output: “known end user performance of connected and sometimes problematic nodes” normal in future issue A in future issue B in future N A N now future
  • 17. 17 labeling dilemma … more than software© Würth Phoenix runtime prediction N A N N N N N N A A A A A • training a classifier needs a number of labeled samples • labeling task that is manually difficult to pursue • how can I get labeled samples in an automatic way? Visual Synthetic Monitoring
  • 18. 18© Würth Phoenix 2018 Visual Synthetic Monitoring Alyvix looks at graphic interfaces Alyvix behaves like human users Alyvix tracks transaction performances alyvix
  • 19. 19© Würth Phoenix 2018 Alyvix provides GUI tools to design any app transaction alyvix designer and editor
  • 20. 20© Würth Phoenix 2018 End user interaction flow ↓ List of transactions ↓ Test case alyvix test case building
  • 21. 21© Würth Phoenix 2018 1. detects object 2. takes its time 3. interacts with it 1. detects object 2. takes its time 3. interacts with it alyvix automation
  • 22. 22© Würth Phoenix 2018 Word (virtualized) unavailable or RDWebAccess 1. Check AVAILABILITY RDWebAccess 2. Measure RESPONIVENESS Word (virtualized) Word (virtualized) Word (virtualized) or or alyvix measurement
  • 23. 23© Würth Phoenix 2018 service downtimes latency spikes alyvix End User Experience monitoring N A A N N N N N N A A A
  • 24. 24 synthetic dynamics AX network … more than software© Würth Phoenix RDCBroker AX RDS 1 AX AOS 1 SQL ServerAX RDS 2 AX RDS 3 AX AOS 2 AX AOS 3 Synth user 1 Synth user 2 Synth user 3
  • 25. NATS WP 25 NEXT Anomaly PREDICTION ARCHITECTURE … more than software© Würth Phoenix customer InfluxDB Grafana early alert
  • 26. 26© Würth Phoenix 2018 Thanks for your attention!

Editor's Notes

  1. It's a long way to the System Diagnostics: identifying a problem, being able to analyze the system status and discovering the root causes. Technically, it means that we have to list the factors that lead to the problem - in order of importance. Obviously, the root cause is at the top of the list. This is what we do and how we behave.
  2. To make it clearer, we can take a Dynamics AX Network as an example. In this network, we would have some hosts, which would have different functions. On the left, we have 3 end users; the broker has the duty to deliver the AX remote session, which can handle and balance the load. The AX AOS processes the app logic. In the end, the info is storage in the SQL Server.
  3. What do we offer as Würth Phoenix? Well, we offer a cloud solution to monitor the different features with different measures. Therefore, we properly measure those hosts. The infrastructure of this cloud service is structured as follows. We install Telegraf on the RDC Broker, AX RDS, AX AOS and on the SQL Server. Telegraf allows us measuring the desired metrics, which are sent to a server. The server on the customer network is NetEye, powered by Icinga. At this stage, I synchronize the “customer NetEye” with the “Würth Phoenix NetEye” through Nats, which is a data channel. In order to manage and represent the data at best, we have InfluxDB and Grafana.
  4. We have our monitoring system. As you can see, for each element we have different measures and different metrics.
  5. IfluxDB takes and stores data and Grafana allows us delivering customers graphs. The representation of the data is the real value of the solution. It comes in useful to finalize the Anomaly Detection and it helps understanding the direction to be taken and what the focus is.
  6. How do we manage data? We manage data with Python. It’s easy and it has a lot of packages, which lets us managing the situations described and tensors. Therefore, Í have various uncorrelated metrics and a multi dimensional space. Thanks to NumPy we can work and create with tensors. With Pandas, I can organize these tensors in temporal series. With Scikit Learn I can work on data and do machine learning.
  7. This slide has the aim of summarize what machine learning is. Machine learning algorithms are often categorized as supervised or unsupervised. Supervised machine learning algorithms can apply what has been learned in the past to new data using labeled examples to predict future events. Starting from the analysis of a known training dataset, the learning algorithm produces an inferred function to make predictions about the output values. The system is able to provide targets for any new input after sufficient training. The learning algorithm can also compare its output with the correct, intended output and find errors in order to modify the model accordingly. In contrast, unsupervised machine learning algorithms are used when the information used to train is neither classified nor labeled. Unsupervised learning studies how systems can infer a function to describe a hidden structure from unlabeled data. The system doesn’t figure out the right output, but it explores the data and can draw inferences from datasets to describe hidden structures from unlabeled data.
  8. Our colleague Susanne has devised an anomaly detection model. It processes data host-by-host and it takes data samples from last two weeks. The establish sampling period is of 5’’. As for the machine learning algorithm, Susanne chose the unsupervised learning (Isolation Forest): the choice is due to the no-labelling (normal or abnormal). Once we have our algorithm, in order to turn it into a classificator and to have a separating border, there a every-day training phase. The output, as you can see on the slide, is a rate: -0.5 e +0.5.
  9. This is the model output. The orange bars meaning is that in that period an anomaly can occur more probably. Those graphs can be expanded or collapsed, in order to deepen a certain anomaly in a certain period.
  10. As you can see, it is possible to analyze the root cause.
  11. Francesco thought about a potential next anomaly prediction model. It would consider the whole network and it would process the data of the whole network. The PCA would extract the most relevant features and it would analyze them. The Principal Component Analysis reduces the dimensionality and consequently the managing data. The computing power would decrease such as the problem analyzed. In this case the algorithm would be supervised – that is normal or abnormal according to a cause-and-effect logic. Therefore, if we have a certain pattern, we have a certain status in the future. Subsequently, we would train a classifier to predict anomalies.
  12. But… How do we label the data samples? We have several networks and training a classifier needs a number of labeled samples. So, how can I get labeled samples in an automatic way? Alyvix can be an effective solution. It gives transactions’ and flow’s performance . Therefore, we can precisely know if a flow is fast or slow, if there are slowdowns, breakdowns. In the end, Alyvix allows us giving a digital label.
  13. It is ‘Visual’ because Alyvix looks at graphic interfaces. If you can see something on your screen Alyvix can do that too. It is ‘Synthetic’ because Alyvix behaves like human users. If you can synthetize something (e.g. a music instrument, a vitamin), that is because you can reproduce it artificially. And that’s exactly what you can do with Alyvix synthetizing graphical application states and the way to interact with. And finally, it is a ‘Monitoring’ system because Alyvix (with a proper integration in Icinga) keeps track of the performance measures about each application transactions in a given user interaction flow.
  14. Alyvix provides GUI tools to design any application transactions, from the point of view of their graphical aspects and interaction modes. By the way, at its core, Alyvix relies on the following open-stack of libraries: Python as programming language, RobotFramework for desktop automation, OpenCV and Pillow for image processing, TesseractOCR for text recognition, PyQt for GUI programming
  15. At the end of the day what we would like to do is the complete translation of an entire user interaction flows. Synthetizing user transaction flows we are going to obtain Alyvix keyword flows. Practically, what we get is a list of keywords in the Alyvix editor: we get an executable test case. By the way, keywords are Python methods within a Python module, which is the so-called AlyvixProxies of the test case.
  16. This is an example on a web service: getting results for Google search. Alyvix runs a browser addressing Google, then Alyvix tries continuously to detect that object. When it’ll appear on the screen Alyvix takes that time passed and interacts with that object. That’s what happens on and on until the end of the test case. The important thing to highlight is that Alyvix engine is design to really output net performances, without image processing time, no detection and no interaction times. So, it has a precise and accurate measurement engine.
  17. The synthetic monitoring goals are: to check the availability of all defined transactions of a given use case (pursuing a task with any kind of application through its GUI); to measure the responsiveness times of all defined transactions, until one of them breaks (so it is unavailable: it was not ‘painted’ on screen)
  18. This is an example on the final achievement. We can detect service downtimes and latency spikes, and we can assess the quality of the service from the end user point of view. Therefore, we can use our synthetic data to associate a digital label and consequently to do machine learning with our algorithm.
  19. We would integrate synthetic users in the Dynamics AX Network.
  20. According to the previous slides, this would be the anomaly prediction model architecture.