A Tool to Measure Continuous Complexity John C. Thomas, John T. Richards, Cal Swart, Jonathan Brezin IBM T.J. Watson Research Center PO Box 218 Yorktown Heights New York 105098 USAABSTRACT complexity” are often used interchangeably althoughIn this paper we describe two types of complexity; “psychological complexity” is potentially a broader termcontinuous complexity and discontinuous complexity. We that could encompass, for example, the emotionaldescribe a tool to measure continuous complexity that we complexity of relationships, the perceptual complexity of adeveloped with an eye both to psychological theory and to fine Cognac or the motor complexity of a finely tuned golfthe practicality of the software development process. The swing. Although these types of complexity might impactuse of the tool requires developers to change perspective usability, investigators in Human Computer Interactionfrom system functionality to the tasks required of the users have generally focused on “cognitive complexity.” and this in itself has value. The quantitative output of thetool provides metrics to measure progress in reducing It is useful to distinguish between intrinsic complexity (thatoverall complexity as well as pin-pointing where a required supplied by the nature of the environment or the task itself)task is particularly complex. Although this tool focuses on and gratuitous complexity (additional complexity imposedcontinuous complexity, it also allows the developer to by a tool, system, or artifact). In general, we want todocument instances of discontinuous complexity. We minimize gratuitous complexity although in a few contexts,illustrate the output of the tool for anonymized installations. that may not be the case. For instance, in vigilance tasks, educational, aesthetic, or entertainment settings,Author Keywords minimization of complexity might not be the goal.Complexity, metrics, usability, modeling, methods, tools. Regardless of just how complex one wishes to make a system, however, having a reliable and valid way toACM Classification Keywords measure complexity is important.H5.m. Information interfaces and presentationMiscellaneous; H.1.2 Models and principles, User/machine APPROACHES TO MEASURE COMPLEXITYsystems; I.6.3 Simulation and modeling, applications. One useful approach to cognitive complexity is the work on the “Cognitive Dimensions of Complexity.” . It isINTRODUCTION probably better thought of as a useful tool to aid discussionComplexity is a widely used concept in fields as diverse as than as a “metric” of complexity. More quantitativebiology, computation, economics and psychology. Here, we methods of modeling human behavior include notably,are concerned primarily with the psychology of complexity. GOMS  and EPIC . The use of these models is notThe terms “psychological complexity” and “cognitive limited to human-computer interaction, but they can certainly be applied there. In cases where an economically significant number of users will be using a product for a significant amount of time, such approaches can be quite useful . However, in other cases, systems and applications are developed for a small number of users; increasingly, end users are even creating essentially ad hoc applications for themselves. In addition, many applications and systems are subject to updates that are made so frequently that such an extensive approach to modeling is not economically feasible. Finally, such fine modeling is 1
best done after special training in addition to a background increases the likelihood of adoption as well as acceptancein cognitive psychology. For these reasons, we set out to of results. That may or may not be ideal, but it is a realitydevelop a tool to help developers quickly obtain reliable that faced our team as designers.and useful quantitative metrics about the complexity ofwhat they were building. The Utility of Faster Feedback in Learning Studies have long indicated that delayed feedback can beCONTINUOUS AND DISCONTINUOUS COMPLEXITY very confusing and disruptive; for instance, talking withOur main goal was to develop a tool that produced metrics delayed auditory feedback or watching one’s motorof complexity. We envisioned this to be something that performance with delayed visual feedback can be verywould give metrics in terms of a fairly continuous scale. In disruptive. When feedback is delayed, it can also makeour experience both with observing usability and in learning extremely difficult. Mere passage of time makesattempting to build usable applications, we also see cases learning more difficult and in addition, in the real world, itwhere the most natural description is that usability is often makes the attribution of error (and therefore, choosingdiscontinuous. That is, the user does not simply feel among potentially corrective strategies) more ambiguous.somewhat more frustrated, take a little longer to do a task, There is a trade-off therefore, between tools that provide theor make a few more errors along the way. Rather, it often greatest verisimilitude to real world usage (which requires,happens that the user is completely prevented from making ultimately, real users using the tool with realany further progress without outside intervention. For this documentation, real product and real support systems) andreason, we wanted to include in the tool a simple facility for those that are available as early as possible in the design-documenting such cases. development-deployment life cycle.As an example of discontinuous complexity, an installation THE MODEL UNDERLYING THE TOOLprocess was meant to install several components and the The complexity model underlying the tool is based on theinstallation kept failing. The associated log file was empty. work of Brown, Keller and Hellerstein . This modelAfter several tries, the expert user attempted to install one measures complexity along three dimensions: the numberof the components by itself. In this case, an informative of steps in a process, the number of context shifts, and theerror message returned indicating that there was not enough working memory load required at each step.memory. The user added memory, re-ran the installationand succeeded. In this case, an underlying informative Rationale for Number of Stepserror message was somehow “blocked” in the over-arching Of course, not all “steps” are equal and so using the sheerprocess and not surfaced despite its criticality. number of steps as a metric is somewhat arbitrary. However, in most of the tasks we have studied so farTHE CONTEXT OF OUR MODEL (installation, configuration, and simple administration), theSpecial purpose tools are ideally developed with respect to steps can be defined fairly objectively. In GUI’s, everya particular context, set of tasks, and set of users. In our new screen is considered one step. In line-orientedcase, we wanted to develop a tool that would be useful to interfaces, every “Enter” is considered another step.developers doing actual software development. Typically, in comparing alternative products or various versions of one product, the “steps” are fairly similar inThe Software Development Process complexity (except as captured in the other two metrics;Software development has become, in many ways, a race i.e., memory load and context shifts). There are two majoragainst time. While it is difficult to amass overall statistics, dissatisfactions or shortcomings with the model as appliedeven the potential outliers provide some insight. For to straight-line processes. One is that it does not capture theexample, one software system issued successive releases on complexity of the reading that is required either on the6-19-01, 7-23-01 and 8-17-2001. Another site lists release screen or with accompanying documentation. The seconddates as 4-20-2005, 7-7-2005, 9-28-2005, 2-28-2006, and is that it does not measure how much background3-10-2006. Realistically, how much do such schedules knowledge is required to decide which items need to beallow for unit and functional testing, let alone user testing noted for future reference. Nonetheless, in general, asor constructing detailed psychological modeling? The processes gain more steps, there is a fairly uniform increaseeducated guess would be, little time indeed. in the chance of an error and certainly, an increase in time. As these tasks are performed in the real world, eachThe Culture of Metrics and Tools additional step also increases the probability of beingOur particular corporate culture places a high value on interrupted by some other task which again increases both“objective” measurement. To the extent that we can the chance of error and requires added time to recover state.provide a tool that offers a way to measure complexity witha minimum of interpretation on the input side and a Rationale for Memory Loadmaximum of quantification on the output side, that Memory load is increased whenever the user sees
Figure 3. Blue “spikes” illustrate action steps that require a particularly high memory load. USAGE The tool has had 75 people register to use it. Interviews with a subset of users finds general agreement that the current user interface is relatively straightforward to use and a significant improvement over the first iteration. The tool is being used by personnel in a number of different product lines within our company. It is used by both developers and by UI practitioners. CONCLUSION The tool, though based on a simplified model of cognitive complexity, is useful and usable by developers during development. To the extent that development teams take the effort to really engage in “Outside-In-Design,” and Figure 1. Action steps needed to install comparable products specify relatively detailed user tasks in advance of system when taking all defaults and for a custom installation. design, the tool can be used even earlier in the overallFigure two shows, however, that custom installs require system development process. The tool helps managementconsiderably more memory load. Taken together these determine roughly the competitive position of theirfigures illustrate that simple but meaningful comparisons products with respect to complexity and whether progressbetween whole software packages are possible using our toward simplification is being made with successivetool. versions. For developers and HCI professionals, the tool also provides pointers to those places in a task which require a particularly large memory load thereby focusing efforts to improve usability. ACKNOWLEDGMENTS We thank anonymous reviewers as well as our colleagues for comments and suggestions regarding this paper. We thank our users of the tool and our funders. REFERENCES 1.Brown, A. B., Keller, A. and Hellerstein, J. L. (2005), A model of configuration complexity and its application to a change management system. In Proceedings of the Ninth IFIP/IEEE International Symposium on Integrated Network Management. (IM 2005). 2.Card, S.K., Moran, T.P., and Newell, A. (1983), TheFigure 2. Memory load needed to install comparable products when taking all defaults and for a custom installation. psychology of human-computer interaction. Hillsdale, N.J.: Erlbaum. 3.Gray, W.D., John B.E., Stuart, R., Lawrence, D., &In addition to allowing overall comparisons, the tool can Atwood, M.E. (1990), GOMS meets the phone company:help pinpoint specific areas of complexity in terms of Analytic modeling applied to real-world problems. Inmemory load as shown in Figure 3, below. Proceedings of IFIP ’90: Human Computer Interaction. 29-34. 4. Green, T.G.R. (1989), Cognitive dimensions of notations. In A. Sutcliffe & L. Macualay (Eds.), People and computers V. Cambridge: Cambridge Unverity Press. 5. Kieras, D. and Meyer, D. (1997). An overview of the EPIC architecture for cognition and performance with application to human-computer interaction. Human- Computer Interaction, 12, 391-438.
6.Rauterberg, M. (1996). How to measure cognitive Society for Cybernetic Studies. complexity in human-computer interaction. In Cybernetics and Systems ’96, 815-820. Vienna: Austrian The columns on the last page should be of approximately equal length. 5