Documente Academic
Documente Profesional
Documente Cultură
Yu Zhong1 , T.V. Raman2 , Casey Burkhardt2 , Fadi Biadsy2 and Jeffrey P. Bigham1,3
To do so, we maintain a view indexer in JustSpeak, this This algorithm is used for all local commands that re-
indexer is essentially a hashmap mapping the text labels quire the user to specify name of an interface element,
or content descriptions to their associated interactive in- including activation, switch toggling, long pressing and
terface element. For example, if an image button is la- checkbox updating as shown in table 2. Once a unique
beled as ’compose’ by the developer, even if this label is node is found, JustSpeak will validate whether the com-
not shown on the screen, we will keep an entry of word mand can be performed on this specific object with ac-
’compose’ mapping to this button. Since the contents cessibility APIs, if not, it will continue to a lower ranked
shown on the screen are constantly updated. JustSpeak object. For instance, checking command can be only ap-
listens to all types of accessibility events 1 . Those events plied on checkboxes. For scrolling commands, JustSpeak
include the source of updated view and metadata such only needs to find a scrollable item in the indexer be-
as time stamp, details of the modification, identification cause in most mobile interfaces the chance of more than
of corresponded application, etc. They empower JustS- one extended views existing together is minimum.
peak to dynamically update the indexer to keep it fresh.
As described before, when using online ASR, there are
For example, when the text inside a textbox is modified
usually several scored commands, JustSpeak tries to pro-
by the user or the system, JustSpeak will receive a View
cess each of them to find out whether their arguments
Changed event, then JustSpeak will swap the stale node
and its descendants in the indexer with the newer ones correspond to a unique actionable node in the indexer by
passed in with the event. descending scoring order. This way if ASR produced er-
rors in the highest scored result because of users’ accents
This indexer becomes handy when a local command is or other factors, JustSpeak still has a large chance of exe-
handed to the execution module, we can query the in- cuting the requested command. If none of the command
dexer with the argument string to find matching nodes has argument that can be validated in the indexer, this
shown on the screen. Since user inputs are sponta- execution attempt will be considered failed. The result
neous and do not always match the labels defined by of execution is announced with synthesized speech.
developers, it is necessary to design a flexible mecha-
nism that allows flexibility. We used a word overlapping
based ranking algorithm so that users do not have to say
the exact label to find an on-screen object. For exam- Chaining of Commands
ple, if a command specifies ’compose message’ and there Support of multiple commands in single speech is an im-
are two buttons on the screen that contains word ’mes- portant feature of JustSpeak for two reasons. Firstly,
sage’, then ’compose a new message’ will yield higher it is more time efficient to combine multiple commands
score than ’show message’ and JustSpeak will perform into one sentence than repeating the whole dialog turn
the command on the ’compose a new message’ button. for several times; secondly, it is more natural and con-
sistent with the way spontaneous speeches are produced
1
Android accessibility events, to express ordered requests in single utterance. An ex-
http://developer.android.com/reference/android/view/ ample of using chained voice commands is illustrated in
accessibility/AccessibilityEvent.html. figure 3.
In the JustSpeak framework, utterance parsing and com- USE CASES AND DISCUSSION
mands execution modules work together to understand JustSpeak innovatively provides enhancements to all An-
speeches containing more than one commands. All of droid applications and the Android system itself by the
the supported functions listed in table 2 can be chained means of natural and fast voice control that can be ac-
into a single sentence. As shown in figure 3, the utter- cessed non-visually and hands-freely across the whole
ance is parsed into an array of commands in the order platform. Its value is not limited to assisting blind users,
that they are placed in the sentence. A challenge placed in fact all Android device users can benefit from JustS-
upon utterance parsing is disambiguation, for example, peak in variety of cases.
sentence ’click reset and home’ can be viewed as either
one command: clicking button ’reset and home’ or two
For Blind Users
commands: clicking button ’reset’ and then navigating
As the primary user group of non-visual interaction tech-
back to home screen. In order to obtain better disam-
niques, we believe blind users of Android devices can
biguation performance, grammars of each supported ac-
interact with their smart phones faster and more eas-
tion have to be defined as detailed as possible, JustSpeak
ily with assistance of JustSpeak. In fact, this project was
also assigns higher priority to commands that can be val-
initially designed specifically for blind users, we only dis-
idated on current screen, taking the sentence before as
covered its value for other user groups later during the
an example, if there is a button labeled as ’reset’ on the
course of developments.
screen, then the two commands result will be preferred.
Since execution of an action usually causes numbers of As described before, most blind people use screen read-
interface updates, for example, clicking a button often ers to interact with their computers and portable elec-
results in opening of a new page, a dialog or an appli- tronic devices. On Android smart phones and tablets,
cation, the array of commands can not be executed at JustSpeak does not interfere with existing screen read-
the same time. In fact, the execution module only tries ers and other accessibility services (e.g. TalkBack). On
to validate the first command arguments and execute it contrary, they work together to offer a better user expe-
if possible, and then waits for a short time until all the rience to blind users. In fact, since blind users perceive
accessibility events fired by the previous command are the representation of user interfaces in the form of text
settled and then proceeds to the next command on the read by screen readers, they are already aware of the
updated screen. Theoretically, users can chain as many strings associated with each object that can be said as
commands as they wish, but in practice, we found speech valid voice commands. Therefore, they can more easily
recognition errors build up quickly with the growth of familiarize themselves with JustSpeak and get the best
speech length. With the error tolerance mechanism dis- out of it. We have observed that in many cases blind
cussed before, JustSpeak is able to reliably support nor- users spend a large amount of time looking for a spe-
mal length spontaneous speeches containing four or less cific object on their phones. The main reason is that
commands. screen readers are designed as a linear iterative screen
explorer, although blind users can usually identify an
on-screen control by its first few words and fast forward
with increased text-to-speech speed, it still takes more
time and efforts for them to locate a target on the screen
than for sighted users even if they know it’s shown on the
screen. In unfamiliar applications this problem can be
even worse. JustSpeak can significantly reduce the time
and efforts needed to access application controls for blind
users. A simple case that blind users of Android can
benefit greatly from JustSpeak is launching applications.
Nowadays one can easily find hundreds of applications
installed on a personal Android device, even with visual
access to the screen, it is usually a nightmare for users to
fumble through pages of application icons and spot one
of them that they want to open. With the assistance of
JustSpeak, this operation can be as easy as saying the
application name. In fact, after the release of JustSpeak,
we have noticed many users, both blind and sighted, use
it to launch applications regularly.