The word observable entered the English language roughly 400 years ago, but the concepts of what it means to see, comprehend, and understand something have been debated since time immemorial. Starting in the 19th century, a series of postulates and criteria coalesced into control theory, and it is from this body of knowledge that we gained the word “observability”. Today, with the advent of complex, interconnected computer systems, that word has taken on new meanings and connotations—some useful, some detrimental, and some just plain confusing.
In this talk, we’ll mix a little history, a touch of philosophy, and a healthy dose of reality, to demystify what observability means to us as professional computer people. We’ll tear through the marketing material and unearth foundational principles that will help us to build better infrastructure, write better software, and promote healthier business practices. Finally, we’ll explore some potential new avenues for discussion and understanding.
How to Troubleshoot Apps for the Modern Connected Worker
Editor's Notes
Hi, my name's Dan and I'm an engineer on the Community team at Datadog. The aim of our brief time here together is to better understand the word "observability". It's a big word—13 letters, actually—and it gets thrown around a lot in our industry. But before we get into what it is (which is actually fairly complicated), let's start with what it isn't.
Let's start with monitoring. To be clear, monitoring is important, and you should have lots of it—and it's usually part of a broader observability strategy—but it's not in and of itself observability. Well, not exactly at any rate (as we'll discover).
Everybody loves dashboards right? And we look at (or observe) dashboards, so, that's observability, right? No. Again, dashboards can be useful, and can be part of that aforementioned strategy, but they're not what we're talking about today.
Also—and this one is important—devops is not it either. That said, the CAMS can and does fit very nicely with observability principles and practices—and that's something that I would encourage us all to talk about in the open spaces over the next two days. So now that we've established what observability isn't, let's talk about what it is. And we're going to start with… **next slide**
…a brief history lesson. This is a centrifugal governor. These were invented in the 17th century by Christian `Ha-hruns`, who is perhaps best known as an astronomer, but like a lot of naturalists in the 1600's, he was basically an expert at everything. The governor was—and is—used to regulate the speed of an engine—by which I mean an engine in the classical mechanical sense, as in something that converts energy into mechanical force—in this case to regulate distance and pressure between millstones.
Along comes this man: James Clark Maxwell. The modern world exists in large part due to this man. When you talk about the great physicists of all time, you've got Newton, Einstein, and Maxwell. I would need an hour just to list the things that Maxwell accomplished. What's pertinent to us today is an 1868 he wrote a paper entitled "On Governors" which was an analysis of how centrifugal governors function. As a mere side-effect of this paper, a whole new field of research emerged, and that is **next slide**
If you look it up in Wikipedia this is what it will tell you: Control theory deals with the control of dynamical systems in engineered processes and machines. The objective is to develop a model or algorithm governing the application of system inputs to drive the system to a desired state … and ensuring a level of control stability. Ok, I'm going to pause here for a moment. Why the history lesson? Because it's important to understand that control theory has been around for a long time. It's an extremely mature field of study, and on that is extremely focused on mechanical and industrial concepts. But it's also where we get the word observability. But before we get to that gold nugget, we need to talk about **next slide**
… this. In order to accomplish the aforementioned objectives of control theory, something called a controller is itself theorised. The controller examines a value as measured from a system at a given point in time, and compares it to a reference value. The difference between these values is called the "error signal" (sound familiar?). This signal can then be used to apply some sort of corrective behaviour, the aim of which is to bring the delta towards zero. How does it do this? Via a property of a system called observability.
This is the "classic" definition, taken directly from control theory. But there are lots of words in this definition that, themselves, are worth diving into, because every single word here means something specific.
Let's start with "measure". It means to "ascertain the size, amount, or degree of something", but also to "assess the importance, effect, or value of something". In other words, we can measure both quantity and quality, and this dual-nature is important to understand.
In English, "state" has a few definitions, but the one we want is "the particular condition that something is in at a specific time." The key words there are condition and time, and that last one is critical; as they say in life, timing is everything.
So, system. This is "a set of things working together as parts of a mechanism or interconnecting network", but it's also "a set of principles or procedures by which something is done", like a scheme or method. So it's either literally a complex thing, or figuratively, a way of describing a complex thing.
This brings us to knowledge, which, uh, is a big one (and honestly we'd need more than a few minutes to really get going), but for our purposes—here, today—let's go with this: "it is the sum of what is known".
Historically, output is related to production—like how much fabric a power loom can produce in 1850. But today we can also think of output as information—in particular, the information that is produced by a system (hopefully on purpose).
That basic definition of observability, then, belies a frankly incredible level of complexity. As I said, it's taken from control theory—which is important, because the definition taken in a vacuum (as it often is) is missing a key bit of context.
…and that's the State Observer. In control theory, this is the mechanism that actually takes the measurements of the inputs and outputs—that actually gathers the quantities and qualities and times and conditions—and provides data that can become knowledge.
Observers in control theory are all maths and algorithms and processes, and it gets deep fast. So, let's step back from the brink of madness and re-center ourselves. For us, as computer people, how can we understand observability in practical terms?
The Three Pillars is an outdated definition but it's worth mentioning because it still pops up all over the place in docs and blogs posts, and was the first widely distributed and accepted interpretation of o11y as something that goes beyond monitoring. But today, as an industry, we've really moved past this basic definition.
It's about a wide variety of perspectives, about having many different ways of introspecting and interpreting and exploring and explaining complex systems. You need to look at both sides of the coin, except in this case, the coin exists in 72 dimensions and it's impossible for a human to actually perceive it all at once.
O11y is all about our capacity to ask questions in order to better understand and comprehend complex systems. There's two important things that are implicitly stated here: you need data, and you need the means to interact with and make sense of that data. To be clear, those are two separate things (and you need both).
Once you have the data, and you have the ability to interact with that data in arbitrary ways, you can start to ask questions about things that you didn't know about ahead of time. And that's the difference between o11y and monitoring. Monitoring is about what you've already discovered; o11y is about what you have yet to explore.
So for us as technologists, we can—and should—have a more expansive understanding of the word observability. For us, it's more than than a property—more than just state estimation via inferrence. It's about perspectives, and our capacity to understand and reason about our systems from different angles. It's also about data, and our capacity to both collect and interpret it. And finally it's about asking questions and gaining new insight—information that we didn't have before.
This is where we start to diverge from that basic, oft-repeated definition. There's more to it than self-governing industrial machinery. The capacity to be observed is necessary but insufficient; we must consider who or what is ultimately using this data, and their half of the relationship, as well.
In other words, the consumer of the information shares a relationship with what is being observed. And that relationship is important to recognise. Classical observability is today merely the starting point. We, as an industry, as technologists, as programmers and product managers and SREs, are poised for the next great evolution in how we understand and interact with complex systems.
I submit this as a talking point, for us here today, and for us as an industry, to consider, debate, and—hopefully—use as a vehicle to improve ourselves and the work that we do: that observability as a property of a system is important, but so is the consumer's capacity to make use of that property. Thank you.