This paper presents UtilML, a novel approach for tackling resource utilization prediction challenges in the computing continuum. UtilML leverages Long-Short-Term Memory (LSTM) neural networks, a machine learning technique, to forecast resource utilization accurately. The effectiveness of UtilML is demonstrated through its evaluation of data extracted from a real GPU cluster in a computing continuum infrastructure comprising more than 1800 computing devices. To assess the performance of UtilML, we compared it with two related approaches that utilize a Baseline-LSTM model. Furthermore, we analyzed the LSTM results against User-Predicted values provided by GPU cluster owners for task deployment with estimated allocation values. The results indicate that UtilML outperformed user predictions by 2% to 27% for CPU utilization prediction. For memory prediction, UtilML variants excelled, showing improvements of 17% to 20% compared to user predictions.
Machine Learning Based Resource Utilization Prediction in the Computing Continuum
1. MACHINE LEARNING BASED RESOURCE UTILIZATION
PREDICTION IN THE COMPUTING CONTINUUM
Christian Bauer, Narges Mehran, Dr. Radu Prodan and Dr. Dragi Kimovski
1
[1] - HTTPS://CAMAD2023.IEEE-CAMAD.ORG/
6. CONTRIBUTIONS
Analysis of publicly available monitoring traces
Development of a POC machine learning approach called UtilML
that improves utilization prediction (CPU and memory)
Evaluation of different models based on regression metrics
6
7. THE SCENARIO
Distributed computing resources are (often)
managed by a resource manager
This resource manager accepts requests
from users and allocates resources
based on the user estimations
7
8. COMPUTING
CONTINUUM
CONSISTS OF A COMBINATION OF CLOUD,
FOG AND EDGE LAYERS
8
[2] - HOSSEIN ASHTARI. EDGE COMPUTING VS. FOG COMPUTING: 10 KEY COMPARISONS.
HTTPS://WWW.SPICEWORKS.COM/TECH/CLOUD/ARTICLES/EDGE-VS-FOG-COMPUTING, 2022. [ONLINE; ACCESSED 01-NOV.-2023]
17. EVALUATION RESULTS
Evaluation results contain prediction performance analysis for CPU and memory utilization of
tasks
User predictions
Baseline-LSTM predictions – a simple LSTM variant with
Capacity of CPU and memory
User prediction of CPU and memory
UtilML predictions
More complex than Baseline-LSTM
Additionally, it uses task knowledge
17
18. EVALUATION RESULTS – CPU [%]
Actual CPU UtilML-LSTM Baseline-LSTM User
mean 516.073 454.205 392.630 632.809
std 881.832 579.213 705.771 496.245
min 1.023 2.395 3.030 5
25% 103.632 129.884 97.586 400
50% 208.749 249.392 118.014 600
75% 528.076 662.490 281.472 600
max 7790.371 5634.635 5793.996 6400
18