Friday, October 27, 2006

Exploration- User Testing Prototype 1

First level user testing involved two people,

one male and one female to test the working of the prototype

It was quite an interesting experience to watch them interact with the system

The video of prototype 1 is uploaded at www.freewebs.com/ruchiraparihar/proto_upload.htm

The videos of the user testing are uploaded at

www.freewebs.com/ruchiraparihar/proto_test1.htm

Exploration - White Paper ( Final Stage)

Well, the project white paper is turning out well, It was an experiment so as to find out what do people write ( as textual graffiti ) if given a chance to.

There were big white sheets being stuck in three computer laboratories where students work for almost 24 hours

The three laboratories differed in the following terms:

IT Lab had 24 students , 1 paper and no writing instrument provided
New MediaLab 8 Students, 1 paper and a sketch pen as a writing instrument provided
SUID Lab 5 Students , 1 paper and no writing instrument was provided

The project carried on for 4 days.

here are the images at

the first stage:


the intermediate stage:




the final stage:



The observations and deductions were:

For such a thing to take place, the instrument with which they write is a major factor which holds them from writing, i.e. if people don’t have a pen in their hand ( or in the vicinity ) they wont attempt to write even if they have something to write

The above point makes it clear that users would be happy to use voice as a tool to make graffiti because, they wont have to wait for an instrument to be in their hands, all they need is their voice.

Very few people take the initiative to be the first ones to write, i.e. the initialization should be done already for people to carry on the chain reaction.

Hence, the system of Graffi-V should be such that it initiates graffiti on its own and is exciting enough to invite people.It can do things like generating a random quote on its own, or start saying something like "who is it?" , and there can be people answering to the questions being put up and there will be graffiti in response to whatever they would say.

Usual scribbles on the white paper are often answering the one who has written before them.
It is interesting to note an answer back pattern.

Which gives an idea of a chat based interface, where two people talk and they create graffiti.

People use it as a messege board to leave messeges for all.

Hence the idea of a messege board, or a thought for the day board.

People crib a lot when given a chance to write on the paper, and when their identity is not revealed.I found obscene stuff on one of the whitepapers.

Hence there should be word filters if such a system is to made public.
Also , it can be used as a place to vent out one's frustrations.

I also found reminders on the whitepaper.

Hence it can be used as a reminder board.
It can also be used to save stories ( as in saving the memories for the others to scroll back and see )

I realized that the three labs were differing in their scenarios, it makes clear that the more the number of people involved , the more is the fun and more is the involvement in turn.


Research- Human Factors in Speech

Human factors in speech

?>?>?>?>
For many years a group of human factors specialists have studied the implications of speech technology on human computer interaction.


In addition to physiological aspects of human factors, there are the cognitive and psychological aspects of human interacting with speech technology in computers .for example what constraints must users observe in their speech so that a speech recognizer can understand them?

Does constraining their speech make them more or less effective? Does it change the way they work? How do people react to synthesized speech? Do they mind if the computer sounds like a computer? Do they prefer that it sound like a human? How does computer speech affect task performance? Now add this to the aspect of Multi-Modality. Some speech technology involves speech only, but a significant portion of the interfaces being designed with speech are multi-modal. They involve not just speech, but other modes such as tactile or visual. For example a desktop dictation system involves speaking to the computer and possibly using the mouse and keyboard to make corrections. Speech added to a personal digital assistant handheld device means that people will be speaking while looking at a small screen while pushing buttons. Research is looking at when people use which modes and how they use them together.


Here, then are some of the human factors issues surrounding speech technology.


High error rates


Neural network technology dramatically improves speech recognition systems and allows speech recognizers to hear human speech even better than humans do, especially when competing with background noise.

Much work has to be done to help humans to detect errors and to devise and carry out error strategies. Imagine if every tenth key press you made on your keyboard resulted in the wrong letter appearing on the screen. This would affect your typing and your performance significantly. That describes the state of errors with speech recognition for many systems.

Unpredictable errors


Besides relatively high error rates, the errors that speech systems make are not necessarily logical or predictable from the human’s point of view. Although some are more understandable- such as hearing ?>?>?>?>Austin when the user says Boston-others seem illogical.

When we speak to a computer we don to appreciate the effect that such qualities as intonation, pitch, volume, and background noise can have. We think we have spoken clearly but we may actually be sending and ambiguous signal. The computer may understand a phrase one time and misunderstand the same phrase another time. Users do not like using un-predictable systems lower interims of acceptance and satisfaction of speech technology

People’s expectations

Humans have high expectations of computers and speech. When they are told that a computer has speech technology built in they often expect that they will have a natural conversation with it. They expect the computer to understand them and they expect to understand the computer. If this human-like conversational expectation is not met (and it is often not met), then they grow frustrated and unwilling to talk to the computer on its realistic terms.

However if humans are given realistic expectations of what the computer can and cant understand .then the are comfortable constraining their speech to certain phrases and commands. This does not seem to impede performance on task. Using constrained speech is not a natural way for people talk to pother people or even a natural way for people to talk to computers .nevertheless, within short time users can learn and adapt well to constrained speech.

Users prefer constrained speech that works to conversational speech those results in errors.

Working multi-modally

Many tasks lend themselves to multi modality for example a traveler may point to two locations on a map saying “how far?”... People will use one modality such as speech alone followed by other another modality such as pointing with a mouse or pen. In other words they will switch between modes. Sometimes they use two or more modes simultaneously or nearly so for example pointing first and then talking.

Speech only systems tax memory

Because a speech only system lacks visual feedback or confirmation, it is taxing on human memory. Long menus in telephony applications for instance are hard to remember

Spoken language is different

People speak differently than they write, and they expect systems that speak to them to use different terminology than what they may read. For example, people can understand terms such as delete or cancel when viewing them as button labels on a GUI screen but they expect to hear a less formal language when they listen to a computer speak

Users are not aware of their speech habits. Many characteristics of human speech are difficult for computers to understand, for example using “ums” or “uhs” in sentences or talking too fast or too softly .Many of these characteristics are unconscious habits.


People Model Speech

Luckily people will easily model another’s speech without realizing it. We can constrain or affect the user’s speech by having the computer speak the way you want the user to speak. People tend to imitate what they hear!



This is taken from the book "Designing Effective Speech Interfaces" by Susan Weinschenk and Dean T Barker

Research- Types of interfaces

As usability engineers we are interested in how to make technology easy to learn and use

We are concerned with how to design products so that people can be productive as possible with the product, as quickly as possible. Designing to optimize usability means paying specific attention to how the interface looks and acts

This includes

¨ Ensuring that the interface matches the way people need or want to accomplish a task

¨ Using the appropriate modality for example visual or voice at appropriate time

¨ Spending adequate design time on the interface

What are the types of interfaces?

There are several types of interfaces.in the past and still around to some extent were character based user interfaces then graphical user interfaces became prevlent and next came web user interfaces WUI and then speech user interfaces

What is speech interface?

It would be simple to answer but it is not

Because it is a relatively new idea to have our technology involve speech. This is a field that is just starting to grow. Like any new field, the definitions and terminologies are not standard

Speech interfaces

The term speech interface describes a software interface that employs wither human speech or simulated human speech. You can further break down interfaces into auditory user interfaces and graphical user interfaces with speech.

Auditory user interfaces AUI

An auditory user interface is an interface which relies primarily or exclusively on audio for interaction including speech and sounds. This means that commands issued by the machine or computer as well as all commands issued by the human to control the machine or computer are executed primarily with speech and sounds. Although AUI may include a hardware component such as a key pad or buttons visual displays are not used for critical information

Examples are

¨ Medical transcription software that allows doctors to dictate medical notes while making rounds

¨ Automobile hands free systems that allow drivers to access travel information and directions

¨ Interactive voice response systems in which users access information by speaking commands such as menu numbers to listen to information of their choice

¨ Products for the visually impaired that rely only on audio text and cues

Graphical user interfaces with speech

AUI where the user interacts with the software primarily via speech.we call these multi-modal interfaces ie graphical user interfaces with speech or S/GUI for speech/GUI

¨ A word processor that allows users to dictate text instead of or in addition to typing it in

¨ Web navigation software that allows users to navigate to and within websites by using voice

¨ Talking dictionaries that speak definitions

In these S/GUI applications, tasks can

¨ Be completed using speech only where users issue a speech command or listen to the software speak to the software speak to them

¨ Rely on visual or manual gui aspects for example viewing a graphic or clicking a hyperlink

¨ Require or at least allow a combination of both a GUI

Non-speech audio

Some interface elements include audio but not speech these interface elements include music and sounds.some non speech audio is included in almost all interfaces of any type including S/GUI , AUI,and GUIs .examples of non speech audio include

¨ The computer beeps when the user makes an error

¨ The user clicks on a map and hears a low tone to indicate that the water to be foudna t that site is deep in the ground or a high tone to indicate that the water is closer to the surface

This is taken from the book "Designing Effective Speech Interfaces" by Susan Weinschenk and Dean T Barker

Thursday, October 26, 2006

Prototype- First Prototype


Ok, the first prototype is finally coded and is ready.



I am uploading a video at 
www.freewebs.com/ruchiraparihar/proto_upload.htm

Wednesday, October 25, 2006

Thought Process - Frustration

It frustrates when the code doesnt work the way you want it to.

Trying to visualize speech in some way.. and what the code is doing is not the way I want it to look on the screen.
Spent my day revising Visual Basic so that to incorporate Microsoft's speech recognition in it and to be able to save files real time..
The book I am reading is Mastering VB 6, what I have on my system is Visual Studio 2005, this so called upgraded version lacks in upward compatibility of the look and feel or atleast the terminology of the older version.Finding it really difficult to work for me.
Tried uninstalling VS2005, and tried installing VB 6, it did not happen because the system was able to find a dll which was of a higher version and hence could not be re-written and hence the set up was aborted.
So I installed Visual Studio 2005 again to work.. and now, Microsoft Office is having some problem with VS 2005, every time I try to open any Microsoft Office Application, it tries to install some of the missing components ( God Only Knows, where they went ) and the application either starts after a long delay or crashes.

So, this explains, that even if you have everything in your mind-all concepts ready-you are late or unable to implement- because the world is SOFT..

But in the end we are all dependent on softwares..

Tuesday, October 24, 2006

Explorations- Echo

I just managed to create an echo using , just check the following link , and make sure your microphone and speakers are turned on, and you would be able to listen to your own voice!

 http://www.freewebs.com/ruchiraparihar/echovoice.swf

make sure you have the flash 8 activex plug-in for your browser. 

Coding: Microphone properties in Flash 8

Microphone


public class Microphone
extends Object

The Microphone class lets you capture audio from a microphone attached to the computer that is running Flash Player.

The Microphone class is primarily for use with Flash Communication Server but can be used in a limited fashion without the server, for example, to transmit sound from your microphone through the speakers on your local system.

Caution: Flash Player displays a Privacy dialog box that lets the user choose whether to allow or deny access to the microphone. Make sure your Stage size is at least 215 x 138 pixels; this is the minimum size Flash requires to display the dialog box.

Users and Administrative users may also disable microphone access on a per-site or global basis.

To create or reference a Microphone object, use the Microphone.get() method.

Availability: ActionScript 1.0; Flash Player 6

Property summary



activityLevel:Number [read-only]
A numeric value that specifies the amount of sound the microphone is detecting.

gain:Number [read-only]
The amount by which the microphone boosts the signal.

index:Number [read-only]
A zero-based integer that specifies the index of the microphone, as reflected in the array returned by Microphone.names.

muted:Boolean [read-only]
A Boolean value that specifies whether the user has denied access to the microphone (true) or allowed access (false).

name:String [read-only]
A string that specifies the name of the current sound capture device, as returned by the sound capture hardware.
static
names:Array [read-only]
Retrieves an array of strings reflecting the names of all available sound capture devices without displaying the Flash Player Privacy Settings panel.

rate:Number [read-only]
The rate at which the microphone is capturing sound, in kHz.

silenceLevel:Number [read-only]
An integer that specifies the amount of sound required to activate the microphone and invoke Microphone.onActivity(true).

silenceTimeOut:Number [read-only]
A numeric value representing the number of milliseconds between the time the microphone stops detecting sound and the time Microphone.onActivity(false) is invoked.

useEchoSuppression:Boolean [read-only]
Property (read-only); a Boolean value of true if echo suppression is enabled, false otherwise.


Event summary


Event
Description
onActivity = function(active:Boolean) {}
Invoked when the microphone starts or stops detecting sound.
onStatus = function(infoObject:Object) {}
Invoked when the user allows or denies access to the microphone.

Method summary
Modifiers
Signature
Description
static
get([index:Number]) : Microphone
Returns a reference to a Microphone object for capturing audio.

setGain(gain:Number) : Void
Sets the microphone gain--that is, the amount by which the microphone should multiply the signal before transmitting it.

setRate(rate:Number) : Void
Sets the rate, in kHz, at which the microphone should capture sound.

setSilenceLevel(silenceLevel:Number, [timeOut:Number]) : Void
Sets the minimum input level that should be considered sound and (optionally) the amount of silent time signifying that silence has actually begun.

setUseEchoSuppression(useEchoSuppression:Boolean) : Void
Specifies whether to use the echo suppression feature of the audio codec.

Methods inherited from class Object

-------

This makes it clear that if I use Flash Professional 8 for coding, i would be able to make use of the following properties of sound:

activityLevel, gain index,muted,name, rate and silenceLevel .

Monday, October 23, 2006

Thought Process - Intermediate state

Right nowI am trying my hand on coding and basically trying and building up a prototype which will take audio input, separate it into its various parameters and map its corresponding values to text attributes.

I have been able to speech to text conversion. but have not been able to integrate it with some interface so that its output can be used as input

The project "white paper" is continuing.

At the same time I am trying to work on a script which would let people do graffiti online and save it.