Evaluating User Interface Systems Research

Alex pointed me to Evaluating User Interface Systems Research, an article by Dan R. Olsen Jr. that was published at UIST 2007 as part of a panel discussion.

Abstract:

The development of user interface systems has languished with the stability of desktop computing. Future systems, however, that are off-the-desktop, nomadic or physical in nature will involve new devices and new software systems for creating interactive applications. Simple usability testing is not adequate for evaluating complex systems. A set of criteria for evaluating new UI systems work is presented.

What I found interesting about this paper is that Olsen tries to address the problem of evaluating UI architectures and toolkits. We assume almost everything in HCI has to be validated by usability tests, while it doesn’t make sense to do so for toolkits and architectures. He proposes a set of alternative evaluation techniques. Olsen knows what he is talking about, as he created the impressive XWeb system.

The paper addresses the question “How should we evaluate new user interface systems so that true progress is being made?”. The author motivates this question by stating that UI systems research (e.g. toolkit or windowing system architecture and design) is still necessary if we want to move beyond the desktop. Lots of good research into input techniques needs better systems models. Multi-user, multi-touch systems are for example often forced into the standard mouse point model, but these systems produce inputs the size of a hand or finger and are used by multiple users at once. Multiple input points and multiple users are discarded when everything is compressed into the mouse/keyboard input model (although multiple users can usually be handled by using multiple mouse cursors). Systems based on one screen, one keyboard and one mouse are the new equivalent of command-line interfaces.

Olsen discusses a few benefits of a good UI systems architecture:

  • reduce development viscosity
  • least resistance to good solutions
  • lower skill barriers
  • power in common infrastructure
  • enabling scale

He then goes on to discuss the usability trap. According to Olsen, usability testing has three key assumptions. Toolkit and UI architecture rarely meet these requirements. The first assumption is that users have minimal training (”walk up and use”). It is clear that any toolkit needs expertise in using it. Secondly, to compare systems (or techniques) we assume that there is task that is reasonably similar task between the two systems (”standardized task”). This is also violated by toolkits or UI architectures. Any problem that requires a system architecture or a toolkit is by nature complex and will have many possible paths to a solution. Meaningful comparisons between two tools for a realistic problem are confounded in many ways. Finally, we assume that it must be possible to complete any test in 1-2 hours (”scale of the problem”). Again, this is impossible with toolkits and UI architectures since building a significant application using two different tools would be very costly.

The usability trap is the idea that good HCI research by definition requires usability testing. Olsen clearly shows where usability testing is not suitable and proposes an alternative method to evaluate these systems. He also discusses that searching for “fatal flaws” in a system is devastating for systems research. It is virtually impossible for a small team of researchers to recreate all of the capabilities of existing systems. The omission of an important feature is guaranteed, and the existence of a fatal flaw is a given.

First, Olsen states that we should clearly specify our research in the context of situations, tasks and users (”STU”). He then discusses a few criteria that are useful to evaluate a system innovation, and shows how to demonstrate that the system complies to these criteria. The ones he discusses are:

  • Importance
  • Problem not previously solved
  • Generality
  • Reduce solution viscosity
    • Flexibility
    • Expressive leverage
    • Expressive match
  • Empowering new design participants
  • Power in combination
    • Inductive combination
    • Simplifying interconnection
    • Ease of combination
  • Can it scale up?

While I won’t go through all of these criteria, I’ll give a few examples. For instance, importance can be proved through the importance of the user population (”U”), the importance of the tasks (”T”) and the importance of the situations (”S”), e.g. how often do the target users find themselves in these situations and do they need to perform these tasks in those situations?

Expressive match is an estimate of how close the means for expressing design choices are to the problem being solved. It’s a way to reduce the solution viscosity (to reduce the effort required to iterate on many possible solutions). For example, one can express a color in hexadecimal or one can pop up a color picker that displays the color space in various ways and shows the color currently selected. The color picker is a much closer match to the design problem.

Simplifying interconnection comes down to reduce the cost of introducing a new component from N to 1. Suppose we have N components working together. If every component must implement an interconnection with every other component, then the N+1 component must include N interconnections with other pieces. A good interconnection model will reduce the cost of a new component from N to 1. An example would be that every new component must just implement the standard interface, after which it will be integrated with all other components. Olsen gives the example of pipes in UNIX.

Ease of combination illustrates the importance of interconnections to be simple and straightforward. As an example, Olsen refers to the simple HTTP protocol and REST architecture versus the overly complex SOAP protocol. This is no surprise since Olsen based XWeb on the WWW architecture.

It might be interesting to introduce this paper for the course Evaluation of user interfaces to give another perspective on evaluation methods.

Leave a Comment