Sunteți pe pagina 1din 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/262242604

What influences dwell time during source code reading? Analysis of element
type and frequency as factors

Conference Paper · March 2014


DOI: 10.1145/2578153.2578211

CITATIONS READS

11 176

3 authors, including:

Roman Bednarik Carsten Schulte


University of Eastern Finland Universität Paderborn
108 PUBLICATIONS   1,234 CITATIONS    73 PUBLICATIONS   885 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Gaze in Programming View project

Future Surgical Systems View project

All content following this page was uploaded by Roman Bednarik on 03 July 2014.

The user has requested enhancement of the downloaded file.


What influences dwell time during source code reading? Analysis of element
type and frequency as factors
Teresa Busjahn∗ Roman Bednarik† Carsten Schulte‡
Department of Computer Science School of Computing Department of Computer Science
Freie Universität Berlin University of Eastern Finland Freie Universität Berlin

Abstract users create and share, and thus one needs to possess skills to un-
derstand the work of others. Therefore, although largely neglected
While knowledge about reading behavior in natural-language text in computing education, the skills to read source codes of others are
is abundant, little is known about the visual attention distribution equally important as code development.
when reading source code of computer programs. Yet, this knowl- The design of tools for programming has been driven by studies
edge is important for teaching programming skills as well as de- of cognitive processes of programmers as they attempt to under-
signing IDEs and programming languages. We conducted a study stand programs. It is indeed comprehension rather than perception
in which 15 programmers with various expertise read short source that makes code reading a complex task. Comprehension of a pro-
codes and recorded their eye movements. In order to study attention gram is one of the most challenging tasks for a programmer, be-
distribution on code elements, we introduced the following proce- cause other programming activities depend on it [Wiedenbeck et al.
dure: First we (pre)-processed the eye movement data using log- 1993].
transformation. Taking into account the word lengths, we then an- Code reading and comprehension skills are greatly important in in-
alyzed the time spent on different lexical elements. It shows that dustrial software development. In industry, it is common that large
most attention is oriented towards understanding of identifiers, op- software projects are shared between numerous developers. Fluctu-
erators, keywords and literals, relatively little reading time is spent ation in workforce requires newcomers to be able to quickly acquire
on separators. We further inspected the attention on keywords and the knowledge about the program code. Program maintenance, up-
provide a description of the gaze on these primary building blocks dating, adaptation and migration require excellent program com-
for any formal language. The analysis indicates that approaches prehension skills [von Mayrhauser and Lang 1999].
from research on natural-language text reading can be applied to As yet, there is no comprehensive research on code reading. Nev-
source code as well, however not without review. ertheless, former eye tracking studies on programming include e.g.
methods for reviewing software [Uwano et al. 2007], investigating
the impact of program identifier styles [Binkley et al. 2012] and
CR Categories: H.1.2 [User/Machine Systems]: Software
debugging behavior of novices and experts [Bednarik et al. 2006;
psychology—Code reading;
Bednarik 2012]. We will describe the distribution of visual atten-
tion on different code elements using dwell time. It denotes the
Keywords: code reading, program comprehension, eye tracking sum of all fixation durations during a single visit of an AOI. Dwell
time is sometimes also called gaze time or gaze duration. However,
1 Introduction [Holmqvist et al. 2011] suggest to use the term dwell, since it is
less ambiguous than gaze. Subsequent visits to the AOI mark new
dwells. Total dwell time comprises the sum of all dwell times on
Reading, with its central role in education and everyday interac- a element over the complete trial. If an element receives relatively
tion, is one of the better understood areas of human behavior where higher dwell time, it gets more attention. This can indicate rich
eye tracking has provided important insights. Reading of a natural- information, and / or higher complexity of the element.
language text generates patterns of gaze that are today known to the
extent that computational models are able to predict them with high
accuracies [Rayner 1998]. 2 Study methodology
Although reading is ubiquitous in everyday life, little is known
about readership skills in domains of reading other than natural- 2.1 Java Code
language text (NLT). The present work focuses on reading of source
code (SC) of computer programs, for its importance in comput-
ing education, and design of programming environments and lan- Source code, even though executable and highly formalized, is nev-
guages. Teaching programming skills, as part of computer literacy ertheless a form of text and read by humans. In order to examine,
education, has often been approached from the production of code. to what extent findings from natural-language text reading can be
However, the ability to program comprises not only production but transferred to code, we conducted an eye tracking study on code
also understanding of already existing source codes [Busjahn and reading. As programming language we chose Java due to its wide
Schulte 2013]. Source codes of programs are artifacts computer use and representativeness. Since the grammatical category of a
word can influence fixation time, we are studying the lexical struc-
∗ e-mail: ture of Java. It has three types of input elements: whitespace, com-
teresa.busjahn@fu-berlin.de
† e-mail: ment, and token. As tokens are the substantial elements of source
roman.bednarik@uef.fi
‡ e-mail: carsten.schulte@fu-berlin.de code, we concentrate on them. Identifiers are sequences of letters
and digits that denote variables, methods etc., e.g. ExampleClass
or main. Java has 50 Keywords, which are reserved and cannot be
used as Identifier. Often used keywords are if and return. A Literal
Permission to make digital or hard copies of part or all of this work for personal or is the code representation of a value of a primitive type, a String
classroom use is granted without fee provided that copies are not made or distributed (”Hello World!”), or the null type. Examples for primitive types are
for commercial advantage and that copies bear this notice and the full citation on the
numbers, like 42, the boolean values true and false, and characters
first page. Copyrights for third-party components of this work must be honored. For all
other uses, contact the Owner/Author. as ’a’. The nine Separators are : ( ) { } [ ] . , and ;. Finally there are
ETRA 2014, March 26 – 28, 2014, Safety Harbor, Florida, USA. Operators that can consist of one or two characters, like = and ++
2014 Copyright held by the Owner/Author.
ACM 978-1-4503-2751-0/14/03

335
[Gosling et al. ]. Deducing from findings in natural-language text
processing, we expect that the distribution of visual attention varies
depending on the type of lexical elements.
Fixation duration during reading is influenced by a number of fac-
tors, like the word’s length, predictability, and frequency. Read-
ers tend to spend more time looking at low-frequency words than
at high-frequency words. Consistently with other studies, [Rayner
and Duffy 1986] found that both the first fixation duration and the
dwell time on infrequent words were significantly longer than on
their more frequent controls. Hence, low-frequency Java keywords
ought to get longer first fixation durations and dwell times than
high-frequency keywords. First fixation duration and first dwell
time are usually correlated, since the dwell time includes the first
fixation. However, first fixation duration is associated with lexical
access, while dwell time additionally includes post-access integra-
tion processes.

2.2 Study Design Figure 1: Skewed total dwell times on the left, log-transformed
dwell times on the right.
Our feasibility study included natural-language text (German) and
source code [Busjahn et al. 2011]. A set of 11 small programs of
varying complexity was developed together with multiple choice
questions on the programs algorithmic idea. The code includes fun-
damental concepts like loops and branches, as well as keywords
with different frequencies. The identifiers were English, but not too
descriptive e.g. start calc. In a pre-study the initial code sample was
tested on five novices and five experts. The results and comments
of these subjects were used to refine the code examples and to ar-
range a sequence from easy to difficult. The final design contained
two German texts and ten Java programs, both with comprehension
questions.1
A total of 15 subjects participated in the study. As a start, they
had to fill in a questionnaire about their programming experience.
Their skills ranged from complete novice to expert in different lan-
guages. During reading of natural-language text and code, an eye
tracker recorded the subjects eye movements. The experiment was Figure 2: Dwell time on NLT words according to their length.
conducted in a regular office setting, using a Tobii T120. Working
with a computer is a quite natural setting for programming tasks, so
a very lifelike situation was achieved.
source codes. A linear model had to be created for every set of
texts anew.
2.3 Data transformation
3.2 Attention Distribution over Lexical Elements
As pointed out by [Holmqvist et al. 2011], dwell time distributions
tend to be right-skewed. This positive skew can be seen both in the
total dwell times of natural-language text and source code. There- After correcting for length (Figure 4), the time spent on the cate-
fore log-transformation was used to achieve a normal distribution gories was normalized by element frequency per category. The dif-
(Figure 1). ference between uncorrected and corrected times is minimal (Table
1). Still the check for the length-effect is necessary. Since we found
a correlation between length of element and time spent on it, this
3 Results effect has to be taken into account when analyzing data on a more
granular level, e.g. within single categories.
3.1 Normalization to Length Our first hypothesis was that the distribution of visual attention
varies depending on the type of lexical elements of source code.
Normalization problem: The more characters an element contains, This was confirmed, though only Separators are substantially differ-
the longer it takes to read it. This effect, known from natural- ent from the other categories. A similar study conducted by Crosby
language text (Figure 2), can also be found in source code reading
(Figure 3).
Some normalization has to be done to compensate for this length- Code category % Uncorrected % Corrected
effect. As the general approach we opted for the overall mean plus Separators 8.19 8.39
unstandardized residuals of a linear model with number of charac-
ters as independent variable. Using the linear model gathered from Literals 20.17 20.30
natural-language text to correct for the length-effect in source code Keywords 21.96 21.67
proved to be insufficient, since the effect is smaller for the used Operators 23.22 23.67
1 The source codes and comprehension questions can be found Identifier 26.46 25.97
at http://www.mi.fu-berlin.de/en/inf/groups/ag-ddi/
Forschung/Materialien/index.html. Table 1: Percentage of total dwell time per category

336
and we Java. Furthermore, they employed number of fixations as
measure, while we utilize total dwell times. If elements get lots of
short fixations, or few but very long ones, their and our results will
be very different.
Another cause is probably that the keywords include the separators
’begin’ and ’end’. In our experiment separators got the least to-
tal dwell time, so begin and end might have been hardly looked at,
as well. Since they constitute one third of the keyword set, their
short dwell times have a great influence on the total time spend on
keywords. Moreover, the earlier study had six keywords in one pro-
gram, while we have 23 that are distributed over ten programs and
partially even appear several times.

3.3 Keyword Frequency


Figure 3: Dwell time on SC elements according to their length.
Word frequency relates to first fixation duration and first dwell
times on words. Less frequent words induce longer fixation dura-
tion and dwell times. [Rayner and Duffy 1986] assume that fixation
times on a given word are an indicator of how easy or difficult it is
for a specific reader to understand a certain word. The frequency
of Java keywords depicted in Figure 5 was gathered by counting
the occurrences of each keyword in the JDK 7. This project con-
tains over 7 million tokens, over a million of them being keywords.
The frequency of elements depends on the corpus used. The JDK
7 is the core of Java and reasonably large to give a notion of the
occurrence of keywords.

Figure 4: Dwell time on SC elements corrected for element length.

& Stelovsky also analyzed how programmers distribute their atten-


tion on different code areas [Crosby and Stelovsky 1990; Crosby
et al. 2002]. They divided a binary search program written in Pascal
into categories of diverging complexity, namely comments, com-
parisons, complex statements, simple assignments and keywords.
The keywords used by Crosby & Stelovsky were begin, end, if, re-
peat, then, and until. Comments were looked at most often, while
keywords got the least attention. The authors conclude that key-
words are of low semantic value.
The data was later re-evaluated. The number of fixations was nor-
malized by dividing the number of fixations by the number of words
/ characters in each category. The correction for element length Figure 5: Frequency of keywords in %.
mainly affects comments - basically natural-language text - they
now get much less attention. This emphasizes the conclusion that
the length-effect is mainly potent in natural-language text. How-
ever, keywords still get the least attention.
Besides keywords, the categories of interest for Crosby & Stelovsky
contained more characters and were composed of symbols, identi-
fiers and literals and therefore much complexer than the keywords.
Grasping their combined idea is supposedly much harder, than per-
ceiving the meaning of a single keyword.
This work illustrates the need to normalize visual attention over
source code elements according to length, which is addressed in
[Crosby et al. 2002]. After first using the uncorrected values, the
normalization by the number of words / characters is a first ap-
proach to take element length into account. This remains a com-
promise, since neither words nor characters correspond to a unit
that can be perceived with one fixation. We addressed this issue by
using linear models.
Different to our data, their keywords got the least visual attention
both before and after length correction. This is not necessarily a Figure 6: First fixation duration on keywords.
conflict. One of the reasons for that might be that they used Pascal

337
find adequate reading strategies and to infer higher order cognitive
processes.

References
B EDNARIK , R., M YLLER , N., S UTINEN , E., AND T UKIAINEN ,
M. 2006. Analyzing individual differences in program com-
prehension. Technology, Instruction, Cognition and Learning 3,
3/4, 205.
B EDNARIK , R. 2012. Expertise-dependent visual attention strate-
gies develop over time during debugging with multiple code rep-
resentations. Int. J. of Human-Computer Studies 70, 2, 143–155.
B INKLEY, D., DAVIS , M., L AWRIE , D., M ALETIC , J. I., M OR -
RELL , C., AND S HARIF, B. 2012. The impact of identifier style
on effort and comprehension. Empirical Software Engineering
Figure 7: First dwell time on keywords (corrected). 18, 2 (May), 219–276.
B USJAHN , T., AND S CHULTE , C. 2013. The use of code reading in
teaching programming. In Proc. of the 13th Koli Calling, ACM,
We tested the assumption that low-frequency Java keywords get 3–11.
higher first fixation durations and first dwell times than high-
frequency keywords. After removing potential outliers, first fixa- B USJAHN , T., S CHULTE , C., AND B USJAHN , A. 2011. Analysis
tion duration increases with word frequency (Figure 6). A light cor- of code reading to gain more insight in program comprehension.
respondence can be found between the infrequent words and those In Proc. of the 11th Koli Calling, ACM, 1–9.
with the longest first dwell time (Figure 7). However, the linear
models are not significant, with an R2 of 0.07 (first fixation) and C ROSBY, M. E., AND S TELOVSKY, J. 1990. How do we read
0.05 (first dwell). algorithms? a case study. Computer 23, 1, 24–35.
This difference between source code and natural-language text C ROSBY, M. E., S CHOLTZ , J., AND W IEDENBECK , S. 2002. The
might be due to the very limited set of possible keywords (50 in roles beacons play in comprehension for novice and expert pro-
Java) in contrast to the vast amounts of possible words in natural- grammers. In 14th Workshop of the PPIG, 58–73.
language text. Moreover, in source code there is no ambiguity of
words or sentences, rather on the contrary, it is highly structured G OSLING , J., J OY, B., S TEELE , G., B RACHA , G.,
text. In addition, in natural-language text it is much more likely AND B UCKLEY, A. The java language specification.
to encounter a rare word that is unknown to the reader. The diffi- http://docs.oracle.com/javase/specs/jls/
culty of reading source code lies probably more in the interaction of se7/html/index.html. [Accessed 2013-10-02].
elements than in discriminating the meanings of the few keywords. H OLMQVIST, K., N YSTR ÖM , M., A NDERSSON , R., D EWHURST,
R., JARODZKA , H., AND VAN DE W EIJER , J. 2011. Eye track-
4 Conclusions ing: A comprehensive guide to methods and measures. OUP
Oxford.
Some claim that programming is far more complex than human R AYNER , K., AND D UFFY, S. A. 1986. Lexical complexity and
mental activities usually studied by psychologists [Weinberg and fixation times in reading: Effects of word frequency, verb com-
Schulman 1974]. Understanding of the mental processes of pro- plexity, and lexical ambiguity. Memory & Cognition 14, 3, 191–
grammers when reading a source code has important implications 201.
on computing education and on the design of programming envi-
ronments. Because of the lack of knowledge about code reading R AYNER , K. 1998. Eye movements in reading and information
patterns, as well as differences between novice and expert program- processing: 20 years of research. Psychological Bulletin 124, 3,
mers, we began conducting eye tracking studies to uncover the nu- 372–422.
ances of gaze behavior during code reading. U WANO , H., NAKAMURA , M., M ONDEN , A., AND M AT-
Source code is a different type of text than natural-language text. It SUMOTO , K.- I . 2007. Exploiting eye movements for evaluat-
is highly formal and structured, and has a very limited vocabulary ing reviewer’s performance in software review. IEICE Trans.
of keywords, operators, and separators, yet tremendous combina- on Fundamentals of Electronics Communications and Computer
tion possibilities for literals and identifiers. For visual attention on Sciences E90-A, 10, 2290–2300.
code elements, separators were found to be a special category of to-
kens, as they get less attention than the other categories. Keywords VON M AYRHAUSER , A., AND L ANG , S. 1999. A coding scheme
seem to be as crucial during code reading as identifiers, operators, to support systematic analysis of software comprehension. IEEE
and literals, providing substantial information for the comprehen- Trans. on Software Engineering 25, 4, 526–540.
sion process. Closer examination of keywords as one example for W EINBERG , G. M., AND S CHULMAN , E. L. 1974. Goals and
code elements shows that in contrast to reading natural-language performance in computer programming. Human Factors: The
text, element frequency is not a relevant factor in the variability of J. of the Human Factors and Ergonomics Society 16, 1 (Feb.),
first fixation duration and first dwell time during code reading. 70–77.
With regard to methodology, log-transformation and normalization
to length are necessary for both sorts of text, as they share the right- W IEDENBECK , S., F IX , V., AND S CHOLTZ , J. 1993. Characteris-
skew of dwell times and the length-effect. Applying measures and tics of the mental representations of novice and expert program-
analyses from natural-language text reading to source code seems mers: an empirical study. Int. J. of Man-Machine Studies 39, 5,
to be a rich direction of study. Further studies are needed in order to 793–812.

338

View publication stats

S-ar putea să vă placă și