Active User | Metaphor | Euphoric Function
Theories
Viewing this software from the active user perspective, we identified the following concerns. Each concern is labeled by relevance to the production paradox (PP) or the assimilation paradox (AP).
Our measurements show that this expectation is not met by the software. We found the voice control to be about twice as slow as the keyboard and mouse. Generally, we experienced a sense disappointment at this discovery.
Because the system must adapt to the user, some minimal training is always required. However, the training only requires about one minute and involves reading a list of the most common commands. At the end of the training, the software suggests running the tutorial which requires an additional 15 minutes. We were pleased at the small amount of training required to begin using the software.
Without a prior knowledge of voice control software, one would assume that the recognition would be perfect. However, our experience revealed that correct recognition depends on a number of variables including microphone quality, the speaker's distance from the microphone, environmental sounds, pronunciation, and tone level. When one of these variables departs from the norm, recognition can degrade rapidly causing significant user frustration.
The following excerpt comes from the DragonDictate for Window's User's Guide: "You must correct recognition errors so that DragonDictate gets better at recognizing your speech. If you don't correct errors in dictate mode, DragonDictate's speech recognition actually gets worse!" Despite this warning, we tended to skip correction when several recognition errors were made in close succession. We were disappointed at needing to stop our work and change focus. Often we found that we forgot our original task if the correction process was especially lengthy.
DragonDictate has two approaches to learning. One approach only allows the software to learn from errors corrected by the users. The other approach learns from each word spoken. Only when the latter approach is enacted in the options settings does the performance degrade with neglect. These learning approaches also blur the meaning of the correction command [ScratchThat]. During dictation, this command is meant to remove a word that is unwanted but was correctly recognized. When the learn-on-correction method is invoked, [Scratch That] is simply a way to delete an unwanted word. However, when the learn-always method is invoked, the software learns even from words removed with this command. These distinctions are quite subtle and can interfere with user performance.
Users do not expect to need to change between dictation mode and command mode. When using the keyboard and mouse, mode changes are not needed since function keys are always available. The nearest analog to a mode change with the keyboard is the Caps Lock key, and nearly everyone has experienced the frustration of forgetting this "mode change". Quite often we were surprised when a command name was typed into the current document because we forgot to switch to command mode.
The voice control language of DragonDictate parallels keyboard and mouse commands. For instance, users can close a window by saying [Alt Key] [F4]. Other examples include common terms like [Maximize], [Minimize], [Page Up], and [Page Down]. This similarity allows users to draw from a rich store of prior knowledge when using the voice control. When we were unsure of what command to use, we thought back to the corresponding keyboard command and spoke the name of that command. We were pleased to find this method highly successful.
Mapping mouse control to voice control is tremendously difficult. DragonDictate offers two solutions to this problem: direction commands and the mouse grid. While the direction commands are consistent (i.e. [Mouse Up] and [Mouse Down]), controlling the speed and direction of the pointer with these commands is tedious at best. The mouse grid provides some improvement on accuracy. The grid divides the screen into nine numbered sectors. Speaking a sector name moves the pointer to the center of that sector and partitions the sector. Four iterations of this process is sufficient to pinpoint any object on the screen. However, using the mouse grid is still less efficient than the mouse and requires much more cognitive effort.
On several occasions, the processing time required for a command would inhibit the software from responding to the next command. Despite using the software on machines with various processor speeds, we found that the software response remained sluggish at times. We were disappointed with this performance since it reduced our overall productivity.
Summary
DragonDictate includes a few design features which attempt to mitigate and attack the production paradox. Attempts to mitigate this paradox by reducing the effort to learn the software are the Quick reference cards provided with the documentation, the simple and short tutorial, the one-minute training session, and the additional quick training which requires approximately 25 minutes. The only attempt to attack this paradox was the mandatory training session. No user profile can be created without this short training. In all, the effort to address the production paradox is limited to training and documentation. We feel this effort should be expanded to address the problems of error correction, understanding the system model, mouse pointer control, and program efficiency.
DragonDictate also attempts to exploits the assimilation paradox by designing the voice control language to mimic known keyboard commands. The design seems like a highly successful solution to us since commands reflect user expectations and leverage prior user knowledge. The software, however, fails to attack the assimilation paradox in two key areas: mode changes and learning strategies. In both of these cases, we found ourselves incorrectly applying prior knowledge.
We also identified the following four metaphors in this software. In each case, the metaphor allows the user to leverage prior knowledge in learning and using DragonDictate.
Whenever users give the [Oops] command to correct a recognition mistake, a dialog box appears which orders recent commands chronologically. Users can browse through this word history by scrolling to the left or right. Moving to the left reveals the previous spoken commands just as viewing the left side of an historical time line reveals older events. This approach allows most users to intuitively discern the ordering of commands in the correction dialog.
The voice control language of DragonDictate parallels keyboard commands. By giving the command [Alt Key] followed by a letter or function key name, users can access all the commands assigned to the alternate key. For instance, users can scroll through open windows by saying [Alt Key] [Tab]. Similar commands are available for the other keys on the keyboard. Since these commands resemble their keyboard counterparts, learning a significant part of the command language is reduced to remembering keyboard commands.
The voice control language provides a subset of commands for controlling the mouse pointer. These commands have a consistent structure which fits the following template: [Mouse<direction> ]. The direction is one of the following: up, down, left, right, upper left, upper right, lower left, lower right. These commands give the impression of one person telling another where to move the mouse or of a user giving directions to the mouse pointer.
The voice control language includes a subset of commands that are unique to DragonDictate. These commands seem to follow a natural language metaphor in that they reflect what one might say in normal conversation. For example, when users must correct a recognition error, they say [Oops] which is a natural word to use in response to a mistake. The command for removing an unwanted word during dictation is [Scratch That] which is reminiscent of what one might say to a secretary taking a dictation.
Summary
These four metaphors contribute to the software in three ways. First, they increase the user's understanding of the system model. The word history is a good example of how the user can readily apprehend the system model via a metaphor. Second, they decrease the time needed to learn the command language. Since the voice control language parallels keyboard commands, users can immediately apply this knowledge to use DragonDictate without learning a large vocabulary of new commands. Third, the natural language metaphor allow users to remember new commands better since command names reflect expressions used in similar real-world situations.
Euphoric Function
The Euphoric Function represents our attempt to make explicit
common, subjective experiences we had while using
DragonDicate. The function shows how user learning and user
expectation change the user's overall view of the software.
Euphoria is defined as "a feeling of well-being."
In this context, we use euphoria to refer to the extent to which
user expectations are met or frustrated. The peaks and
troughs of the function are described below the picture.
