Conrad Albrecht-Buehler at BayCHI: Heed or: How I Learned to Stop Monitoring and Love Situation Awareness
Technology users are technology observers as well, monitoring for problems or opportunities that might arise. Designing interfaces to support the monitoring of technology presents unique challenges, like detecting situations and knowing how and when to respond, coping with a changing operating environment, and the changing knowledge of the observer.
Conrad describes "Heed," a scale and framework to help observers of a system evaluate which situations need scrutiny and when. He gives an example heed-based interface that encourages the development of situation awareness. Learn how the framework and interface can be applied in three different scenarios: server performance, a business's finances, and user experience in a community forum.
To be more be more precise, I think that they’re incomplete.
I would classify monitoring tools into three types,based on how they’re designed to draw your attention, and how much information they contain.
Reports containa lot of information – information that’s focused on a subject and may contain data analysis, but it’s not designed to draw your attention to it, and so it’s pretty easy to ignore.
By comparison, an alarm or alert, is designed to direct your attention, but contains very little information – typically, just one bit of data.
Dashboards contain a lot of data, which is typically unanalyzed data and not necessarily organized around a subject or situation, and so less information than a report, but they’re designed to direct your attention somewhat
So, a way to think about Monitoring UIs ison this sort of continuum.
And there’s this gap. For monitoring Uisto be complete, we need to create interfaces that bridge this gap.
I’llmake the argument that, to bridge the gap, what we need are interfaces that are designed primarily to direct attention, but contain some information. Enough to know when and what to monitor. In other words: when to use dashboards and reports.
To make my case, I’m going to tell you THREE REAL user stories about monitoring, and the difficulties each person has.I’ll discussthe situations that are important to each of them, and how they each perform their monitoring.Each story is as an example of a monitoring task that competes with many other responsibilities, and is onefor which alarms and dashboard don’t always work.
The first story is about monitoring servers
The second is about monitoring business performance by reading financial reports
The third is about monitoring user experience in a discussion forum
Our first story is about a CTO working in a small start-up.The company’s business is based on a web site, and he is charge of everything technical at the company.In addition to his Executive duties – like raising funds and doing interviewshe’s also the Lead developer, in charge of the team creating the web applicationsBut he’s also the Systems admin – builds and maintains all the desktops and all of the servers the company needs.As a sys-admin, one of the things he monitors is the performance of his Web, Database, and Mail servers
He actually has powerful server dashboards but rarely uses them. His dashboards are a lot like this one.
So why doesn’t our CTO use these dashboards – it sure looks like a lot of great data.The problem is…NEXT
Our CTO has a whole lot of other responsibilities besides monitoring, and skimming and interpreting the dashboard takes time.He’s not an air traffic controller who can primarily focus on his monitoring task. But maybe we can learn something from those air traffic controllers…
Air traffic controllers do focus their attention, but have a multitude of information that they have to manage.
They, and the designers of their tools, rely on a concept most of you are probably familiar with called Situation Awareness. Mica Endsley is has done a lot of great work in this area, and gives us the canonical definition of Situation Awareness, but for today I prefer "the perception of elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future,"
Jeannot., Kelly,& Thompson’s.Jeannot, E., Kelly, C. & Thompson, D. (2003). The development of situation awareness measures in ATM systems. Brussels: Eurocontrol.
At a very high level, to help develop SA, monitoring interface designers should help the user by guiding attention to information relevant to a user’s goal, help interpret the information’s meaning, and help them to make a forecast.Designing for Situation Awareness: An Approach to User-Centered Design, by Endsley, Bolte, Jones 2003
Dashboards can sometimes help interpret data meaning, and when they include time-domain data like graphs, they can actually help make a forecast.Stephen Few has written a lot of good material on dashboard design, as has of course Edward Tufte.
But even a well-designed dashboard requires mental effort to parse to find the information relevant to a goal.This is an issue of grouping more than of highlighting
What makes the data relevant, together, is the bearing it has on a particular situation:For example: The application server bottlenecking, the server being compromised, the network being misconfigured, the storage reaching capacity, etc.
But we can at least define a couple of design goals at this point: to try and reduce the mental effort required to parse the monitoring UI for info relevant to a goal, perhaps through customization and filtering of the data by situation. These aren’t exactly lofty design goals, and in fact, we do see them enacted, but maybe not often enough. But it’s a start.So, back to our CTO. if he’s not using his dashboards to monitor his servers, what is he using?
He workson his servers at various times through out the day: updating application code, installing a patch, etc.When he happens to be logged into one of the servers, I noticed that he usually runs ‘uptime’ to check that server’s load.
He samples the server load, and if he happens to catch it at a moment of high load and if he has time, he might investigate the situation.In other words, is the server load high enough, that he needs to investigate?“High enough” What is “high enough”? NEXT
That’s pretty subjective. What might be high enough to me, may not be high enough to him.He’s making a subjective evaluation from objective information. If this is what he actually does to monitor his servers, we as designers should support that.
Implicit in the question is another one: is there enough time for him to investigate? Remember he has a lot of other responsibilities.
Essentially, what he’s really asking is: is it important enough for me to investigateThis is the cornerstone of how I think we can improve monitoring UIs. We need another kind of monitoring UI to tell us that there’s something important enough to attend to and begin actively monitoring.Doesn’t that sound suspiciously a lot like an alarm? Couldn’t our CTO just use an alarm to tell him his server load is high enough?
As I mentioned in the beginning, alarms are actually another monitoring UI.But Alarms have their own set of problems:
You never know that they’re comingThey Force you to act, even when you can’tIf you can’t address them, what do you do? Snooze, Cancel, Ignore?
Our CTO also has alarms to tell him about incidents, but he mostly ignores them too.He filters them into their own mailbox, and all but never looks at them. Sometimes, if something goes wrong, he’ll use the cache of alerts to trace back to try and find a root cause, but then he’s using them as an analysis too, not a monitoring toolTo be fair, He does use one sort of alarm regularly though:
Sometimes, someone from his customer support team tells him that they’re getting angry calls. You could say that he lets his customers monitor the system for him and act as an alert system.But that’s not ideal.Let’s look more closely at why an alarm doesn’t work for his uptime monitoring method
Here’s a quick graph of server loadSo, Is load high? Here’s where it get’s subjective.To me, if it’s not above .9 it’s not high. To him, above .7 is high
Here’s the problem with using an alarm: If it goes off, he won’t necessarily have time to drop everything to investigate. And when he has time, the alarm doesn’t tell him anything unless it goes offNEXT
The root of the problem is, that this what an alarm does. It divides the data range.Into data he should ignoreNEXT
And data he should to attend to.NEXT
Remember,he said that he’d investigate if the load was high enough. An alarmjust tells him that it’s high.And what’s more, only tells him that once it’s happened, nothing before.But the graph tells us more. He can see how it’s changing, and see where it is, and where it’s been. But at a cost. He has to interpret the graph, and he’s already dismissed using his dashboards for that reason. Furthermore, it takes up a lot of screen real estate.In part, that’s why network operations centers look like this:
With monitoring displays as far as the eye can see, but our startup CTO can’t afford to do this.So maybe there’s something in between a graph and an alarm we can come up with.
Instead, what if we change how the data range is divided up? From this to
Instead of showing the whole graph, We could just show a region priorto the alarm threshold. Then the position in the region can identify the importance of attending to a situation. If the data is below the ignore threshold, it might as well be at the ignore boundary. Likewise, it doesn’t matter how high the load is, we can consider any reading to be at the attend boundary.
The rangein between enables us to indicate that there is a degree of importance to attending the situation. I refer to this range as “heed.” The closer it gets to the alarm threshold, the more important it becomes to attend to the situation.And we can just use the familiar slider widget to indicate the heed.
Now our CTO has a way of knowing if the load is high enough. Now we can offer him a simple dashboard that relies on the the way he already monitors his servers. One that doesn’t give him too much detail, and can still help him to decide if he should turn his attention to the servers depending on how much time he has. In other words, it helps him make tactical decisions about his time.It does so continuously, so when he has time he can quickly identify if he should use that time to investigate.By making this tiny dashboard easily and persistently visible, he can quickly evaluate the situation and develop some awareness about how often his servers become heavily loaded.But in this case the situation he’s interested in is indicated by a single sensor. Situations may be based on several sensor readings.This is a situation he may need to check on several times a day. But what about monitoring for situations that occur far less frequently?
Let me tell you a little about a CEO that I worked with that has a very different monitoring problem.She is in change of a medium-sized company that offers services online. She has launched several programs to grow the business, and needs to keep her eye on the performance of those programs. However, she doesn’t need to look in on them every day, once a week is probably enough.But she, like our CTO, also has too many responsibilities and is constantly dealing with more immediately urgent problems.As a result she often forgets to check up on the programsFurthermore, she feels quite a bit of anxiety over the performance of these problems which doesn’t help the situation, and only encourage her to avoid checking up on the programs
She monitors the programs by reading reports generated for her that basically look like this.If the report shows that a program is not performing well enough based on her expectations, she would investigate the situation.In many ways, this is a lot like our CTO’sserver monitoring problem, except that the values change much more slowly,So we can try to apply a similar solution.
Here we can see how close each of her clients are to being under-performers.We could just show her these four indicators, but what happens when the program grows to 20 clients? 200 clients?We need a way to compact the display, so she can just decide if she needs to read the report.
Let’s assign some values to the heed range and make it a scale.We can set 0 to minimum heedAnd 1 to maximum heed.
Now we can elect the maximum heed value to represent the whole report.That means, that instead of saying Client M needs your attention, it says the report needs her attention. It says that unless the whole program is performing well, she should investigate the situation.
That is essentially like an OR operation from fuzzy set theory applied to these four heed values.Now she has a single representation that summarizes the report, and again, if we make it persistently visible she will have a continuous evaluation of the program’s performance
And if it looks like the program might need her attention, she can delve into the display, first giving her a more detailed evaluation, and then giving her access to the data itself to investigate
We can use this a model for how heed UIs. should directing user attention:Presenting them with a subjective evaluation of the information and moving them toward more detailed, objective analysis. Going from Simon to House.
This is another way to improve monitoring UIs – show minimal detail, but make more detail directly available to the user through the interface.Everybody’s favorite UI paradigm: Progressive Disclosure
Conrad Albrecht-Buehler at BayCHI: Heed or: How I Learned to Stop Monitoring and Love Situation Awareness
Heed UI or: How I Learned to Stop Monitoring and Love Situation Awareness<br />Conrad Albrecht-Buehler<br />VMware<br />
Situation Awareness<br />"the perception of elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future,"<br />Endsley et al<br />“what you need to know not to be surprised” Jeannot, Kelly, & Thompson<br /><ul><li>Help attend the information relevant to a goal
Help know when and what to monitor</li></ul>Alarm<br />Heed<br />Attention<br />Dashboard<br />Report<br />Information<br />
Reduce Mental Effort To Parse<br />Support Subjective Evaluation<br />Identify When to Monitor<br />Customize & Filter By Situation<br />Help To Forecast Situations<br />Only Necessary Detail<br />Offer More Detail On Demand<br />Encourage Frequent Revision<br />