Distributed HPC monitoring

DO YUO FNID TIHS SMILPE TO RAED ?

Q1 Q2 Q3 Q4
X Y X Y X Y X Y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Frank Anscombe 
1918 - 2001
Standard
Expected
ANSCOMBE'S QUARTET

Q1 Q2 Q3 Q4
X Y X Y X Y X Y
Mean 9 7.50 9 7.50 9 7.50 9 7.50
Var 11 4.125 11 4.125 11 4.125 11 4.125
Cor 0.816 0.816 0.816 0.816
Σ(^2) 13.7627 13.7763 13.7562 13.7425
LinReg y = 0.5x + 3 y = 0.5x + 3 y = 0.5x + 3 y = 0.5x + 3
Coefﬁcient 0.67 0.67 0.67 0.67
Intercept 3.00009 ± 0.909545 3.00091 ± 0.909545 3.00245 ± 0.909545 3.00173 ± 0.909545
Slope 0.500091 ± 0.0953463 0.500000 ± 0.0953463 0.499727 ± 0.0953463 0.499909 ± 0.0953463

0
11
0 14
y = 0.5001x + 3.0001
R² = 0.6665
Q1
0
11
0 14
y = 0.5x + 3.0009
R² = 0.6662
Q2
0
13
0 14
y = 0.4997x + 3.0025
R² = 0.6663
Q3
0
13
0 20
y = 0.4999x + 3.0017
R² = 0.6667
Q4

t0 t10
BATCH PROCESSING TIMELINE
Metrics 
Collection

t10 t45
Metrics 
Parsing
Metrics 
Collection
t0

t10 t45
Action
t60
Metrics 
Parsing
Metrics 
Collection
t0

t7
STREAM PROCESSING TIMELINE
Metrics 
Parsing
t0

t7 t15
Action
STREAM PROCESSING TIMELINE
Metrics 
Parsing
t0

#2 NO TIME TO CONSULT HQ
A
B
C

SETUP
NUDNIK - INFRASTRUCTURE HEARTBEAT
▸ https://pypi.org/project/nudnik/
▸ pip install nudnik
▸ https://aur.archlinux.org/packages/
nudnik/
▸ pacman -S nudnik
▸ git clone 
https://github.com/salosh/nudnik

NUDNIK
MASSAGE TYPES - BASELINE
A
B
C
D
E
20ms
10ms
20ms
10ms
10ms
20ms
10ms
10ms
▸Tiny ﬁngerprint
▸GRPC / REST
▸Multiplatform

NUDNIK
MASSAGE TYPES - LOAD
A
B
C
D
E
10ms
10ms
10ms
10ms
10ms
10ms
20ms
20ms
▸CPU
▸Memory
▸Disk
▸Network
▸Executable

NUDNIK
MASSAGE TYPES - CHAOS
A
B
C
D
E10%
10%
+? ms
? %
▸Set failure %
▸Set latency
▸.. Or randomise
12ms

NUDNIK
REPORTING
▸InﬂuxDB
▸ElasticSearch
▸Prometheus PG
▸Text csv / TTY
AB
TS
C

A
MULTILAYER DISTRIBUTION
B
C
Node A Node B

A
SCALE PLANNING / TESTING
B C
RuOK

Distributed HPC monitoring

Recommended

Recommended

More Related Content

Similar to Distributed HPC monitoring

Similar to Distributed HPC monitoring (20)

More from Salo Shp

More from Salo Shp (7)

Recently uploaded

Recently uploaded (20)

Distributed HPC monitoring