Note on Tool to Measure Complexity


Published on

A short note on a tool to measure the complexity of interfaces.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Note on Tool to Measure Complexity

  1. 1. A Tool to Measure Continuous Complexity John C. Thomas, John T. Richards, Cal Swart, Jonathan Brezin IBM T.J. Watson Research Center PO Box 218 Yorktown Heights New York 105098 USAABSTRACT complexity” are often used interchangeably althoughIn this paper we describe two types of complexity; “psychological complexity” is potentially a broader termcontinuous complexity and discontinuous complexity. We that could encompass, for example, the emotionaldescribe a tool to measure continuous complexity that we complexity of relationships, the perceptual complexity of adeveloped with an eye both to psychological theory and to fine Cognac or the motor complexity of a finely tuned golfthe practicality of the software development process. The swing. Although these types of complexity might impactuse of the tool requires developers to change perspective usability, investigators in Human Computer Interactionfrom system functionality to the tasks required of the users have generally focused on “cognitive complexity.” [6]and this in itself has value. The quantitative output of thetool provides metrics to measure progress in reducing It is useful to distinguish between intrinsic complexity (thatoverall complexity as well as pin-pointing where a required supplied by the nature of the environment or the task itself)task is particularly complex. Although this tool focuses on and gratuitous complexity (additional complexity imposedcontinuous complexity, it also allows the developer to by a tool, system, or artifact). In general, we want todocument instances of discontinuous complexity. We minimize gratuitous complexity although in a few contexts,illustrate the output of the tool for anonymized installations. that may not be the case. For instance, in vigilance tasks, educational, aesthetic, or entertainment settings,Author Keywords minimization of complexity might not be the goal.Complexity, metrics, usability, modeling, methods, tools. Regardless of just how complex one wishes to make a system, however, having a reliable and valid way toACM Classification Keywords measure complexity is important.H5.m. Information interfaces and presentationMiscellaneous; H.1.2 Models and principles, User/machine APPROACHES TO MEASURE COMPLEXITYsystems; I.6.3 Simulation and modeling, applications. One useful approach to cognitive complexity is the work on the “Cognitive Dimensions of Complexity.” [4]. It isINTRODUCTION probably better thought of as a useful tool to aid discussionComplexity is a widely used concept in fields as diverse as than as a “metric” of complexity. More quantitativebiology, computation, economics and psychology. Here, we methods of modeling human behavior include notably,are concerned primarily with the psychology of complexity. GOMS [2] and EPIC [5]. The use of these models is notThe terms “psychological complexity” and “cognitive limited to human-computer interaction, but they can certainly be applied there. In cases where an economically significant number of users will be using a product for a significant amount of time, such approaches can be quite useful [3]. However, in other cases, systems and applications are developed for a small number of users; increasingly, end users are even creating essentially ad hoc applications for themselves. In addition, many applications and systems are subject to updates that are made so frequently that such an extensive approach to modeling is not economically feasible. Finally, such fine modeling is 1
  2. 2. best done after special training in addition to a background increases the likelihood of adoption as well as acceptancein cognitive psychology. For these reasons, we set out to of results. That may or may not be ideal, but it is a realitydevelop a tool to help developers quickly obtain reliable that faced our team as designers.and useful quantitative metrics about the complexity ofwhat they were building. The Utility of Faster Feedback in Learning Studies have long indicated that delayed feedback can beCONTINUOUS AND DISCONTINUOUS COMPLEXITY very confusing and disruptive; for instance, talking withOur main goal was to develop a tool that produced metrics delayed auditory feedback or watching one’s motorof complexity. We envisioned this to be something that performance with delayed visual feedback can be verywould give metrics in terms of a fairly continuous scale. In disruptive. When feedback is delayed, it can also makeour experience both with observing usability and in learning extremely difficult. Mere passage of time makesattempting to build usable applications, we also see cases learning more difficult and in addition, in the real world, itwhere the most natural description is that usability is often makes the attribution of error (and therefore, choosingdiscontinuous. That is, the user does not simply feel among potentially corrective strategies) more ambiguous.somewhat more frustrated, take a little longer to do a task, There is a trade-off therefore, between tools that provide theor make a few more errors along the way. Rather, it often greatest verisimilitude to real world usage (which requires,happens that the user is completely prevented from making ultimately, real users using the tool with realany further progress without outside intervention. For this documentation, real product and real support systems) andreason, we wanted to include in the tool a simple facility for those that are available as early as possible in the design-documenting such cases. development-deployment life cycle.As an example of discontinuous complexity, an installation THE MODEL UNDERLYING THE TOOLprocess was meant to install several components and the The complexity model underlying the tool is based on theinstallation kept failing. The associated log file was empty. work of Brown, Keller and Hellerstein [1]. This modelAfter several tries, the expert user attempted to install one measures complexity along three dimensions: the numberof the components by itself. In this case, an informative of steps in a process, the number of context shifts, and theerror message returned indicating that there was not enough working memory load required at each step.memory. The user added memory, re-ran the installationand succeeded. In this case, an underlying informative Rationale for Number of Stepserror message was somehow “blocked” in the over-arching Of course, not all “steps” are equal and so using the sheerprocess and not surfaced despite its criticality. number of steps as a metric is somewhat arbitrary. However, in most of the tasks we have studied so farTHE CONTEXT OF OUR MODEL (installation, configuration, and simple administration), theSpecial purpose tools are ideally developed with respect to steps can be defined fairly objectively. In GUI’s, everya particular context, set of tasks, and set of users. In our new screen is considered one step. In line-orientedcase, we wanted to develop a tool that would be useful to interfaces, every “Enter” is considered another step.developers doing actual software development. Typically, in comparing alternative products or various versions of one product, the “steps” are fairly similar inThe Software Development Process complexity (except as captured in the other two metrics;Software development has become, in many ways, a race i.e., memory load and context shifts). There are two majoragainst time. While it is difficult to amass overall statistics, dissatisfactions or shortcomings with the model as appliedeven the potential outliers provide some insight. For to straight-line processes. One is that it does not capture theexample, one software system issued successive releases on complexity of the reading that is required either on the6-19-01, 7-23-01 and 8-17-2001. Another site lists release screen or with accompanying documentation. The seconddates as 4-20-2005, 7-7-2005, 9-28-2005, 2-28-2006, and is that it does not measure how much background3-10-2006. Realistically, how much do such schedules knowledge is required to decide which items need to beallow for unit and functional testing, let alone user testing noted for future reference. Nonetheless, in general, asor constructing detailed psychological modeling? The processes gain more steps, there is a fairly uniform increaseeducated guess would be, little time indeed. in the chance of an error and certainly, an increase in time. As these tasks are performed in the real world, eachThe Culture of Metrics and Tools additional step also increases the probability of beingOur particular corporate culture places a high value on interrupted by some other task which again increases both“objective” measurement. To the extent that we can the chance of error and requires added time to recover state.provide a tool that offers a way to measure complexity witha minimum of interpretation on the input side and a Rationale for Memory Loadmaximum of quantification on the output side, that Memory load is increased whenever the user sees
  3. 3. something on a screen that must be stored and used for consume data, so it should be convenient to enter new itemssome future step. Again, in detail, we know that the actual of any of these types at any point. To answer this need, wememory load will depend on the type of item that needs to used a tabbed display, one tab each for actions, data, andbe stored and on the user’s experience and strategies. contexts, and one for the final scoring of the model. (TheHowever, as a first approximation, each new “item” that the tabs are implemented using the Dojo JavaScript toolkit.)user must take note of and remember, increases felt The model for a single task is sufficiently small that it cancomplexity as well as increasing the chance for error. Even easily be kept in JavaScript data structures, so movingwithout error, it takes longer to recover a particular item between tabs involves no delay. When, for instance, thefrom working memory if there are more items in working actions tab is selected, a list of actions is shown (in thememory. order in which they occurred), and one action is shown as selected. The selected action’s details are visible forRationale for Context ShiftsContext shifts were originally defined by the model builders editing, and buttons are used to maintain the list: inserting ain terms of computing contexts (server vs. client, e.g., or new action, reordering those already there, or marking theoperating system vs. data base). We have kept such points at which context switches occur. Editing changes arechanges as context shifts but broadened the definition to immediately transmitted by an asynchronous HTTP POSTinclude shifts between applications or between installation to the server, where they are written to a database that iscomponents. If an installation requires the installation of used to maintain the persistent state of the task model. Thethree sub-components, these components often have data and context tabs differ from the actions tab only insomewhat different appearances and conventions. their view of the content of the model.Context shifts can be disruptive to working memory. In The relational database is straightforward: it requires tablesaddition, different contexts often employ different for tracking users, tasks, actions, data, contexts, and dataconventions and this can cause interference resulting in usage. The latter tracks pairs of data items and actions tolonger latencies, a greater chance of error, or both. record which actions produce, and which consume, what data. The users table allows us to use a single database toITERATIVE TOOL DEVELOPMENT PROCESS serve many users. Each task table entry has a column thatThe original model that we built on took a detailed XML indicates which user’s task it is. Each of the remainingdescription of the task as input. We thought it unlikely that tables (actions, etc.) has a column that indicates which taskdevelopers would use a complexity metric that required it belongs to. Thus each action, etc. belongs to a singlethis. Therefore, we developed a GUI tool to allow users to task, and each task to a single user.define tasks, action steps, context switches, and memory For purposes of communicating the model to otherload without having to use XML. The tool was used by a applications that might wish to use it, there is an XMLsmall group of people for some months. Interviews, Schema that describes an XML document for the raw dataobservations, and spontaneous comments were all used to associated with a single task. There will be a forthcomingdrive a continuous round of changes in the interface. Schema for the scored model as well. One common and convenient method of working is to open two instances ofTHE STRUCTURE OF THE TOOL AND DATABASE the tool; one where successive action steps are noted andThe tool we developed to facilitate the data entry required one where data items are noted in order to calculateby the model is a classic dynamic HTML application with memory load. In some cases, tool users watch an expertthe persistent data held in a relational database at the server. perform a task, take notes and then code the result. In otherThe client is implemented with XHTML and JavaScript, cases, an expert performs a task such as installation andand the server with PHP and MySQL. captures each step via screen shots which are then sent toThere are two phases from the point of view of user input: the tool user for coding.creating or choosing the task to work on, and for a fixedtask, entering its details. Working with the task list is EXAMPLE CONTINUOUS RESULTSsimple and handled easily with a drop-down list and a The tool allows developers and managers to calculatebutton or two. The essential problem is to use the screen overall metrics for their products and to gauge progressreal-estate effectively to enter the details for a fixed task, through successive releases. Figures 1 and 2 respectivelywhich involves two parallel lists, actions, and data, both of show anonymized results for installation of comparablewhich can be expected to be of the order of several tens products in terms of steps, and memory load, respectively.long, perhaps a hundred or so. There is also a much smaller The first figure shows that products differ significantly inlist of contexts, less than half a dozen or so in the vast number of steps required and that custom installs requiremajority of cases. only a few more steps than taking all the defaults.Actions take place in contexts, and they produce and 3
  4. 4. Figure 3. Blue “spikes” illustrate action steps that require a particularly high memory load. USAGE The tool has had 75 people register to use it. Interviews with a subset of users finds general agreement that the current user interface is relatively straightforward to use and a significant improvement over the first iteration. The tool is being used by personnel in a number of different product lines within our company. It is used by both developers and by UI practitioners. CONCLUSION The tool, though based on a simplified model of cognitive complexity, is useful and usable by developers during development. To the extent that development teams take the effort to really engage in “Outside-In-Design,” and Figure 1. Action steps needed to install comparable products specify relatively detailed user tasks in advance of system when taking all defaults and for a custom installation. design, the tool can be used even earlier in the overallFigure two shows, however, that custom installs require system development process. The tool helps managementconsiderably more memory load. Taken together these determine roughly the competitive position of theirfigures illustrate that simple but meaningful comparisons products with respect to complexity and whether progressbetween whole software packages are possible using our toward simplification is being made with successivetool. versions. For developers and HCI professionals, the tool also provides pointers to those places in a task which require a particularly large memory load thereby focusing efforts to improve usability. ACKNOWLEDGMENTS We thank anonymous reviewers as well as our colleagues for comments and suggestions regarding this paper. We thank our users of the tool and our funders. REFERENCES 1.Brown, A. B., Keller, A. and Hellerstein, J. L. (2005), A model of configuration complexity and its application to a change management system. In Proceedings of the Ninth IFIP/IEEE International Symposium on Integrated Network Management. (IM 2005). 2.Card, S.K., Moran, T.P., and Newell, A. (1983), TheFigure 2. Memory load needed to install comparable products when taking all defaults and for a custom installation. psychology of human-computer interaction. Hillsdale, N.J.: Erlbaum. 3.Gray, W.D., John B.E., Stuart, R., Lawrence, D., &In addition to allowing overall comparisons, the tool can Atwood, M.E. (1990), GOMS meets the phone company:help pinpoint specific areas of complexity in terms of Analytic modeling applied to real-world problems. Inmemory load as shown in Figure 3, below. Proceedings of IFIP ’90: Human Computer Interaction. 29-34. 4. Green, T.G.R. (1989), Cognitive dimensions of notations. In A. Sutcliffe & L. Macualay (Eds.), People and computers V. Cambridge: Cambridge Unverity Press. 5. Kieras, D. and Meyer, D. (1997). An overview of the EPIC architecture for cognition and performance with application to human-computer interaction. Human- Computer Interaction, 12, 391-438.
  5. 5. 6.Rauterberg, M. (1996). How to measure cognitive Society for Cybernetic Studies. complexity in human-computer interaction. In Cybernetics and Systems ’96, 815-820. Vienna: Austrian The columns on the last page should be of approximately equal length. 5