Friday, October 27, 2006

Research- Human Factors in Speech

Human factors in speech

?>?>?>?>
For many years a group of human factors specialists have studied the implications of speech technology on human computer interaction.


In addition to physiological aspects of human factors, there are the cognitive and psychological aspects of human interacting with speech technology in computers .for example what constraints must users observe in their speech so that a speech recognizer can understand them?

Does constraining their speech make them more or less effective? Does it change the way they work? How do people react to synthesized speech? Do they mind if the computer sounds like a computer? Do they prefer that it sound like a human? How does computer speech affect task performance? Now add this to the aspect of Multi-Modality. Some speech technology involves speech only, but a significant portion of the interfaces being designed with speech are multi-modal. They involve not just speech, but other modes such as tactile or visual. For example a desktop dictation system involves speaking to the computer and possibly using the mouse and keyboard to make corrections. Speech added to a personal digital assistant handheld device means that people will be speaking while looking at a small screen while pushing buttons. Research is looking at when people use which modes and how they use them together.


Here, then are some of the human factors issues surrounding speech technology.


High error rates


Neural network technology dramatically improves speech recognition systems and allows speech recognizers to hear human speech even better than humans do, especially when competing with background noise.

Much work has to be done to help humans to detect errors and to devise and carry out error strategies. Imagine if every tenth key press you made on your keyboard resulted in the wrong letter appearing on the screen. This would affect your typing and your performance significantly. That describes the state of errors with speech recognition for many systems.

Unpredictable errors


Besides relatively high error rates, the errors that speech systems make are not necessarily logical or predictable from the human’s point of view. Although some are more understandable- such as hearing ?>?>?>?>Austin when the user says Boston-others seem illogical.

When we speak to a computer we don to appreciate the effect that such qualities as intonation, pitch, volume, and background noise can have. We think we have spoken clearly but we may actually be sending and ambiguous signal. The computer may understand a phrase one time and misunderstand the same phrase another time. Users do not like using un-predictable systems lower interims of acceptance and satisfaction of speech technology

People’s expectations

Humans have high expectations of computers and speech. When they are told that a computer has speech technology built in they often expect that they will have a natural conversation with it. They expect the computer to understand them and they expect to understand the computer. If this human-like conversational expectation is not met (and it is often not met), then they grow frustrated and unwilling to talk to the computer on its realistic terms.

However if humans are given realistic expectations of what the computer can and cant understand .then the are comfortable constraining their speech to certain phrases and commands. This does not seem to impede performance on task. Using constrained speech is not a natural way for people talk to pother people or even a natural way for people to talk to computers .nevertheless, within short time users can learn and adapt well to constrained speech.

Users prefer constrained speech that works to conversational speech those results in errors.

Working multi-modally

Many tasks lend themselves to multi modality for example a traveler may point to two locations on a map saying “how far?”... People will use one modality such as speech alone followed by other another modality such as pointing with a mouse or pen. In other words they will switch between modes. Sometimes they use two or more modes simultaneously or nearly so for example pointing first and then talking.

Speech only systems tax memory

Because a speech only system lacks visual feedback or confirmation, it is taxing on human memory. Long menus in telephony applications for instance are hard to remember

Spoken language is different

People speak differently than they write, and they expect systems that speak to them to use different terminology than what they may read. For example, people can understand terms such as delete or cancel when viewing them as button labels on a GUI screen but they expect to hear a less formal language when they listen to a computer speak

Users are not aware of their speech habits. Many characteristics of human speech are difficult for computers to understand, for example using “ums” or “uhs” in sentences or talking too fast or too softly .Many of these characteristics are unconscious habits.


People Model Speech

Luckily people will easily model another’s speech without realizing it. We can constrain or affect the user’s speech by having the computer speak the way you want the user to speak. People tend to imitate what they hear!



This is taken from the book "Designing Effective Speech Interfaces" by Susan Weinschenk and Dean T Barker

No comments: