Designers need some basis for choosing between the huge number of possible techniques that can support error handling. User studies, and other standard HCI methods for gathering qualitative and quantitative data about user interfaces, can be a major source of guidance. A variety of results which can guide us in the design of error recovery interfaces are already present in the literature. Although many of these studies are small and limited in their representation, this only demonstrates how much we have to gain from investigating the area more deeply.
One place to begin is by observing users in situations where error correction occurs --both in everyday life (Baber & Hone, 1993; Zajicek & Hewitt, 1990), and in interactions with error prone computer programs (Nanja & Cook, 1987). For example, both Baber and Hone, and Zajicek and Hewitt, studied the effectiveness of human-like recovery strategies in the context of speech recognition. Their work verifies that linguistic theories about human conversation patterns can be used to guide error recovery techniques.
Although it is possible to ask the user direct questions about how they handle errors, this may miss the point since the best error handling happens with as little conscious attention as possible. An alternative is to compare task completion speeds with and without error correction support, and to test for satisfaction and frustration. Some innovative work in measuring frustration quantitatively as well as qualitatively was done by Riseberg et al. (1998) in their research of affect (the measurable aspects of emotions).
In order to compare studies of different interfaces for error correction which can be used in the same application, Suhm (1997) suggests normalizing the data based on the number of errors which occur. For systems which generate ASCII, he also devised a way to relate accuracy to words per minute (1996a).
The simplest type of error correction possible is to simply repeat the input which was mistaken. Our survey uncovered several studies which compare some more sophisticated correction technique to repeat. Zajicek and Hewitt found that users prefer to repeat their input at least once before having to choose from a menu, a finding confirmed by Ainsworth & Pratt (1992). Also, in the realm of pen input, Goldberg & Goodisman (1991) found that even when alternative guesses are displayed, it takes too much cognitive effort for the user to select from them,a result that meshes with observations about input speed made in the word prediction community (Alm et al., 1992). Baber & Hone (1993) give a good overview of the pros and cons of repetition vs choice. Suhm (1997) added to this work when he found that spoken repetition is faster than choosing from a list, but something like partial word repair is better than both. Partial word repair allows users to correct part of a word when it is almost correct. This could be done either with a pen or with spoken input.
User testing can help to identify the sources of errors as well as with the design of error handling techniques. For example, Frankish et al. (1995) found that systems tend to misunderstand a subset of possible written inputs much worse than the rest, a result confirmed by Marx & Schmandt (1994) in the realm of speech recognition.