The document discusses using knowledge graphs and machine learning to help analyze observability data and enable root cause analysis. It notes current challenges include large amounts of messy, inconsistent observability data and lack of labels. The document proposes using unsupervised or semi-supervised machine learning on known entities and relational data to build observability graphs linking metrics, alerts, services, etc. These graphs could help automate root cause analysis by surfacing related factors. Finally, it emphasizes that observability data is created by people for people, so tools should focus on how people interact with and make sense of the data.
29. IMAGE
The mirroring
hypothesis
a.k.a. Conway’s Law
a.k.a. You ship your org chart...
Colfer, Lyra J., and Carliss Y. Baldwin. "The mirroring hypothesis: theory, evidence, and exceptions."
Industrial and Corporate Change 25.5 (2016): 709-738.
34. IMAGE
Gore’s
hypothesis
a.k.a. The prequel to Dunbar’s Number
a.k.a. The thing you heard about from The
Tipping Point
Zhou, W-X., et al. "Discrete hierarchical organization of social group sizes." Proceedings of the Royal
Society B: Biological Sciences 272.1561 (2005): 439-444.
Hamel, Gary, and B. Breen. "Building an innovation democracy: WL Gore." The future of
management (2007)
41. The Situation
Negatives:
– Large-scale, messy, inconsistent observability data
– Labels are hard to come by
Positives:
– Domain knowledge
– Lots of user-interaction data
42. The Situation
Negatives:
– Large-scale, messy, inconsistent observability data
– Labels are hard to come by
Positives:
– Domain knowledge
– Lots of user-interaction data
Machine Learning
43. The Situation
Negatives:
– Large-scale, messy, inconsistent observability data
– Labels are hard to come by
Positives:
– Domain knowledge
– Lots of user-interaction data
Unsupervised
(or Semi-Supervised)
Machine Learning
44. The Situation
Negatives:
– Large-scale, messy, inconsistent observability data
– Labels are hard to come by
Positives:
– Domain knowledge
– Lots of user-interaction data
Unsupervised
(or Semi-Supervised)
Known Entities
Machine Learning
45. The Situation
Negatives:
– Large-scale, messy, inconsistent observability data
– Labels are hard to come by
Positives:
– Domain knowledge
– Lots of user-interaction data
Unsupervised
(or Semi-Supervised)
Known Entities
Relational Data
Machine Learning
53. Observability Graphs
Alert A Metric M
Dashboard DTeam T
Service SService R
Alert B
Metric N
Metric OAlert C
system.cpu.idle{role:R}“[R] CPU is high on R!”
55. Observability Graphs
Alert A Metric M
Dashboard D
Dashboard E
Team T
Service SService R
Alert W Metric P
Alert B
Metric N
Metric OAlert C
56. Observability Graphs
Alert A Metric M
Dashboard D
Dashboard E
Team T
Service SService R
Alert W Metric P
Alert B
Metric N
Metric OAlert C
Alert X Alert Y Alert Z
57. Observability Graphs
Alert A Metric M
Dashboard D
Dashboard E
Team T
Service SService R
Alert W Metric P
Alert B
Metric N
Metric OAlert C
Alert X Alert Y Alert Z
DB U
Service V
DB W
58. Observability Graphs
Alert A Metric M
Dashboard D
Dashboard E
Team T
Service SService R
Alert W Metric P
Alert B
Metric N
Metric OAlert C
Alert X Alert Y Alert Z
DB U
Service V
DB W
Metric L
62. IMAGE
Sam Cali
CONCLUSIONS
Observability data is created by people to
be consumed by people.
Monitoring tools and data are useless if
people can’t make sense of them.
By studying how people interact with this
data, we can increase the observability of
our systems.