The landmark work of Douglas Engelbart and his team of researchers at SRI in the 60's demonstrated the power of building toolkits to bootstrap the development of increasingly sophisticated interactive systems. Each of the functional themes discussed above provide opportunities for developing toolkits which augment the programming capabilities to implement applications.
For the development of more applications that support transparent interaction, we must be able to treat other forms of input as easily as we treat keyboard and mouse input. A good example justifying the need for transparent interaction toolkits is freeform, pen-based interaction. Much of the interest in pen-based computing has focussed on recognition techniques to convert pen input to text. But many applications, such as the note-taking in Classroom 2000, do not require conversion from pen to text. Relatively little effort has been put into standardizing support for freeform, pen input. Some formats for exchanging pen input between platforms exist, such as JOT, but no support for using pen input effectively within an application.
For example, Tivoli provides basic support for creating ink data, distinguishing between uninterpreted ink data and special gestures [15, 17]. A particularly important feature of these other data types is the ability to cluster them. In producing Web-based notes in Classroom 2000, we want annotations done with a pen to link to audio or video. The annotations are timestamped, but it is not all that useful to associate an individual penstroke to the exact time it was written in class. We used a temporal and spatial heuristic to statically cluster penstrokes together and assign them some more meaningful, word-level time. Chiu and Wilcox have produced a more general and dynamic clustering algorithm to link audio and ink [8]. Clustering of freeform ink is also useful in real-time to allow for a variety of useful whiteboard interactions (e.g., insert a blank line between consecutive lines) and implicit structuring has been used to do this [16]. These clustering techniques need to become standard and available to all applications developers who wish to create freeform, pen-based interfaces.
Designing context-aware applications is difficult for a number of reasons. One example comes from our experience with location-awareness in Cyberguide. We used many different positioning systems throughout the project, both indoor and outdoor. Each prototype had its own positioning-system specific code to handle the idiosyncracies of the particular positioning system used. A location-aware application should be written to a single location-aware API, but this does not exist.
An analogy to GUI programming is appropriate here. GUI programming is simplified by having toolkits with predefined interactors or widgets that can be easily used. In theory, a toolkit of commonly used context objects would be useful, but what would be the building blocks of such a toolkit? The context that has been most useful is that which provides location information, identification, timing and an association of sensors to entities in the physical world. While desktop computing exists with a WIMP (windows, icons, menus and pointers) interface, we are suggesting that context-aware computing exists with a TILE (time, identity, location, entities) interface. In the next section, we will return to the issue of a context-aware infrastructure that separates the concerns of the environment from those of the application.
Our collected experience building a variety of capture applications has lead to the development of a capture toolkit to enable rapid development. The primitives of this toolkit are captured objects (elements created at some time that are aggregated to create a stream of information), capture surfaces (the logical container for a variety of streams of captured objects), service providers (self-contained systems that produce streams of recorded information) capture clients (interactive windows that control one or more capture surfaces and service providers), capture servers (multithreaded servers that maintain relationships between capture clients and service providers and handles storage and retrieval to a multimedia database), and access clients (programs that aggregate captured material for visualization and manipulation).
Perhaps the biggest open challenge for toolkit design is the scalable interface problem. We deal with a variety of physical devices, and they differ greatly in their size and interaction techniques. This variability in the physical interface currently requires an application programmer to essentially rewrite an application for each of the different devices. The application, in other words, does not scale to meet the requirements of radically different physical interfaces.
One approach to scaling interfaces is through automated transformation of the interaction tree representing the user interface. There has been some initial research on ways to transform an interface written for one device into an equivalent application on a different device. For example, the Mercator project automatically converts an X-based graphical interface into an auditory interface for blind users [19]. Similar transformation techniques were employed by Smith to automatically port a GUI interface to a Pilot [23]. The focus of this prior work has been transformation of an existing application. Another avenue to be pursued would be more abstract interface toolkits that can be bound to appropriate physical interfaces for different devices. Just as a windowing system promotes a model of an abstract terminal to support a variety of keyboard and pointing/selection devices, so too must we look for the appropriate interface abstractions that will free the programmer from the specifics of desktop, hand-held and hands-free devices.