Sunteți pe pagina 1din 20

P1: FYJ

Educational Psychology Review [jepr]

PP317-edpr-363440

November 24, 2001

8:41

Style file version Nov. 19th, 1999

C 2002)
Educational Psychology Review, Vol. 14, No. 1, March 2002 (

Commentary

Towards an Integrated View of Learning From


Text and Visual Displays
Wolfgang Schnotz1,2

Visuo-spatial text adjuncts such as static or animated pictures, geographic


maps, thematic maps, graphs, and knowledge maps that have been analyzed
in the articles contained in this special issue provide complex pictorial information that complements the verbal information of texts. These spatial text
adjuncts are considered as depictive representations that can support communication, thinking, and learning. An essential precondition of this supportive
function is that the visuo-spatial displays interact appropriately with human
visual perception and the individuals cognitive system, which is characterized
by prior knowledge, cognitive abilities, and learning skills. Accordingly, effective learning with visuo-spatial text adjuncts can be fostered by instructional
design and by adequate processing strategies, both dependent on sufficient understanding of how the human cognitive system interacts with these displays.
Perspectives for further research in this area are provided.
KEY WORDS: visual displays; spatial displays; adjunct displays; spatial cognition; representations.

Visual displays play an increasingly important role not only in our daily
life, but also in the field of learning and instruction where instructional materials today include more pictures, diagrams, and graphs than a few decades
ago. From a historical perspective, the use of pictorial information in learning
and instruction has a long tradition. In the seventeenth century, Comenius
(influenced by John Lockes sensualism) published his Didacta Magna,
1 Department of General and Educational Psychology, University of Koblenz Landau, Landau,

Germany.

2 Correspondence

should be addressed to Wolfgang Schnotz, Department of General and


Educational Psychology, University of Koblenz Landau, Thomas-Nast-Street 44, D-76829
Landau, Germany; e-mail: schnotz@uni-landau.de.
101
C 2002 Plenum Publishing Corporation
1040-726X/02/0300-0101/0

P1: FYJ
Educational Psychology Review [jepr]

PP317-edpr-363440

November 24, 2001

102

8:41

Style file version Nov. 19th, 1999

Schnotz

which emphasized that envisioning information is extremely important for


effective learning. Scholars of educational science have followed these basic
ideas for centuries. However, it is only since the 1970s that these ideas have
been investigated in a systematic way on an empirical basis.
The articles in this special issue provide an excellent survey of this research especially with regard to the last decade. They focus on rather different kinds of visual displays: static and animated illustrations, geographic,
thematic and knowledge maps, and graphs. These visuals look very different
and can serve rather different purposes. Nevertheless, all of these spatial text
adjuncts have supportive effects on communication, thinking, and learning.
The articles of this volume follow a common intention: They specify under
which conditions and why these effects take place. To attain an integrated
picture of the empirical results and the underlying theoretical concepts, I
consider some representational issues with regard to visuo-spatial text adjuncts. Then, I briefly analyze the interplay of these spatial displays with human visual perception and higher order cognitive processing. Visual diplays
are considered tools for communication, thinking, and learning that require
specific individual prerequisites (especially prior knowledge and cognitive
skills) in order to be used effectively. Based on this analysis I discuss instructional consequences with regard to design issues and with regard to
processing strategies. Finally, I discuss further perspectives for research on
learning from text and visuals.

REPRESENTATIONAL ISSUES
Symbols and Icons
Representations are objects or events that stand for something else
(Peterson, 1996). Texts and visual displays are external representations.
These external representations are understood when a reader or observer
constructs internal mental representations of the content described in the
text or shown in the picture. Comprehension is usually task-oriented. That is,
the mental construction is performed by the individual in a way that allows
him or her to deal effectively with current or anticipated requirements. In
other words, comprehension of text and pictures is a task-oriented construction of mental representations.
Text and visual displays are based on different sign systems. A fundamental distinction between different sign systems was introduced by Peirce
(1906): the differentiation between symbols and icons. According to Peirce,
symbols have an arbitrary structure and are associated with the designated object by a convention. Words and sentences of natural language are

P1: FYJ
Educational Psychology Review [jepr]

PP317-edpr-363440

November 24, 2001

Towards an Integrated View of Learning From Text and Visual Displays

8:41

Style file version Nov. 19th, 1999

103

examples of symbols. Icons, on the contrary, do not have an arbitrary structure. Instead, they are associated with the designated object by similarity.
Accordingly, all kinds of static as well as animated realistic pictures (or pictorial illustrations, respectively) and all kinds of geographic maps can be
considered icons. However, graphs and knowledge maps do not possess similarity with what they represent, and parts of their structure are specified
by convention. One could therefore argue that they are symbols rather than
icons. Nevertheless, graphs and knowledge maps have more in common with
icons than with symbols. This becomes obvious if one characterizes icons in
a more general way: Icons can be defined as signs that are associated with
their designated object by common structural properties. Similarity, then, is
only one kind of structural commonality that is typical for realistic pictures,
pictorial illustrations, and geographic maps. Graphs, on the contrary, are
characterized by a more abstract kind of structural commonality with the
designated object. Knowledge maps that visualize the macrostructure of a
learning content can be considered a pictorial display of the corresponding
knowledge structure.

Descriptive and Depictive Representations


According to the different sign systems on which they are based, texts
and visual displays belong to different classes of representations: descriptive and depictive representations. Texts (as well as mathematical equations,
e.g.) are descriptive representations. A descriptive representation consists
of symbols that have an arbitrary structure and that are associated with
the content they represent simply by means of a convention. If we describe
something in a text, we use nouns to refer to its parts and we use verbs
and prepositions to relate these parts to each other. Visual displays, on the
contrary, are depictive representations. A depictive representation consists
of iconic signs. These signs are associated with the content they represent
through common structural features on either a concrete or more abstract
level.
Representations can differ from one another with respect to their informational content and their usability. The informational content of a representation is the set of information that can be extracted from the representation with the help of available procedures (Palmer, 1978). Thus, the
informational content of a representation depends on both its structure and
on the procedures that operate on the structure. Two representations are
informationally equivalent if every information item that can be taken from
one representation can also be taken from the other representation (Larkin
and Simon, 1987). A piece of information can be relevant for some tasks and

P1: FYJ
Educational Psychology Review [jepr]

104

PP317-edpr-363440

November 24, 2001

8:41

Style file version Nov. 19th, 1999

Schnotz

irrelevant for other tasks, so it is possible to define the informational content of a representation with respect to a specific set of tasks. Accordingly,
two representations are (in a task-specific sense) informationally equivalent
if both allow the extraction of the same information required to solve the
specific tasks.
When two representations are informationally equivalent they can nevertheless differ in their usefulness. Representations are used to retrieve information about what they represent. Depending on the structure of the representation and the processes operating on it, information retrieval (which
often means the computation of new information) can be easy or difficult.
Representations, which are not only informationally equivalent, but also
equivalent in terms of retrieving information, are referred to as computationally equivalent (Larkin and Simon, 1987). Two representations are (in
a task-specific sense) computationally equivalent if each task-relevant information can be retrieved from one representation as easily as from the
other representation. Shah and Hoeffner (this issue) address this issue with
regard to graph design. They argue that there is no specific graph format
that is generally better then others. Designing graphs or any other external representations always requires taking into account the interplay between the representation and the task demands. The relevant questions are:
What kind of procedures have to be performed to solve the task, and how
easily can these procedures be performed with the given representation
structure?
Descriptive representations and depictive representations have different uses for different purposes. Descriptive representations have a higher
representational power than depictive representations. For example, there
is no problem in a descriptive representation to express a general negation
(No pets allowed !) or a general disjunction (Seat reserved for infirm people
and for mothers with babies). In a depictive representation, however, one
can express only specific negations (e.g., a picture showing a dog combined
with a prohibitive sign). Disjunctions are depicted through a series of pictures (e.g., a picture showing an old man plus a picture showing a mother
with her baby). On the other hand, depictive representations encompass
a specific class of information in its entirety. For example, it is possible to
read from a geometric figure (such as a triangle) all its geometric properties. Similarly, a picture of an object is not limited to information about its
form, but also has information about its size and its orientation in space.
In contrast, in a description it is possible to mention only a few geometric
characteristics of a figure or to specify only the form of the object without
providing information about its size or orientation. Accordingly, depictive
representations are especially useful to gain new information from already
known information. A depiction constructed on the basis of already known

P1: FYJ
Educational Psychology Review [jepr]

PP317-edpr-363440

November 24, 2001

Towards an Integrated View of Learning From Text and Visual Displays

8:41

Style file version Nov. 19th, 1999

105

information contains further information that has not been made explicit
so far (Kosslyn, 1994). If one draws a triangle based on information about
two sides and one angle, one can read the size of the third side, the size
of the other two angles, the area of the triangle, and many more geometric
characteristics. The new information is not generated in the sense of a logical
conclusion, but rather can be read directly from the representation (JohnsonLaird, 1983). These have sometimes been called pseudo-inferences (Garrod,
1985).

Mental Representations
The distinction between descriptions and depictions can be applied not
only to external representations such as texts and pictures, but also to internal
mental representations, which are constructed during text and picture comprehension. Current approaches in text comprehension research assume that
in understanding a text a reader constructs multiple mental representations.
The representations include a surface representation of the text, a propositional text base, a mental model of what the text is about, a communication
level, and a genre level (Graesser et al., 1997). The text surface representation includes the detailed linguistic information, such as the specific words,
phrases, and syntactic structures. The text base represents the semantic content of the text in the form of propositions. The mental model represents the
referential content of the text. In narrative texts this is frequently referred
to as a situation model (van Dijk and Kintsch, 1983). The mental model is
constrained both by the text base and by domain-specific world knowledge.
The communication level represents the pragmatic context of the communication between reader and writer. The genre level captures knowledge about
the class of text and its corresponding text function. Evidence for a differentiation between the surface code, the text base, and the mental model level
has been found in several investigations (Kintsch et al., 1990; Schmalhofer
and Glavanov, 1986).
In picture comprehension, the individual also constructs multiple mental representations. These include a surface structure representation, a mental model, a propositional representation as well as a communication level
and a genre level representation. The surface structure representation corresponds to the perceptual (visual) image of the picture in the individuals
mind. The mental model represents the subject matter shown in the picture on the basis of common structural features (i.e., based on an analogy)
between the picture and its referential content. The propositional representation contains information that is read from the model and that is encoded in a propositional format. The communication level represents the

P1: FYJ
Educational Psychology Review [jepr]

106

PP317-edpr-363440

November 24, 2001

8:41

Style file version Nov. 19th, 1999

Schnotz

pragmatic context of the pictorial communication, whereas the genre level


represents knowledge about the class of pictures and their corresponding
functions.
Propositional representations, whether constructed during text comprehension or during picture comprehension, are descriptive representations.
They consist of internal symbols, which can be decomposed, similar to sentences of natural language, into simple symbols. Accordingly, propositional
representations are symbolic representations. Propositional representations
can be viewed as internal descriptions in the language of the mind (Chafe,
1994).
The perceptual images created as surface structure representations during picture comprehension are internal depictive representations. They retain structural characteristics of the picture and use these inherent structural
characteristics as a means of representation. Perceptual images created in
picture comprehension are sensory specific because they are linked to the
visual modality. The proximity of these images to perception can be attributed to the fact that visual images and visual perceptions are based on
the same cognitive mechanisms (Kosslyn, 1994). Mental models, whether
constructed during picture comprehension or during text comprehension,
are also internal depictive representations, as they have inherent structural
features in common with the depicted object. That is, they represent the
object based on a structural or functional analogy (Johnson-Laird, 1983;
Johnson-Laird and Byrne, 1991). Such an analogy does not imply that mental models represent only spatial information. A mental model can represent, for example, also the increase or decrease of birth rates or incomes
during a specific period of time (as it can be described in a text or displayed
in a line graph), although birth rates and incomes are certainly not spatial
information.
Contrary to visual images, mental models are not sensory specific. For
example, a mental model of a spatial configuration (say, of a room) can be
constructed not only by visual perception, but also by auditory, kinesthetic, or
haptic perception. Because mental models are not bound to specific sensory
modalities, they can be considered as more abstract than perceptual images.
On the one hand, a mental model constructed from a picture contains less
information than the corresponding visual image because of its abstraction.
That is, irrelevant pictorial details that are included in the visual image are
omitted from the mental model. On the other hand, the mental model contains more information than the corresponding visual image because it also
includes prior knowledge that is not present in the visual perception. For
example, a mental model of a brake can contain information about causal
relationships that are not explicitly included in the corresponding picture of
the brake (Mayer, 1997).

P1: FYJ
Educational Psychology Review [jepr]

PP317-edpr-363440

November 24, 2001

8:41

Towards an Integrated View of Learning From Text and Visual Displays

Style file version Nov. 19th, 1999

107

THEORIES OF LEARNING FROM TEXTS AND VISUAL DISPLAYS


Dual Coding Theory, Conjoint Processing Theory,
and Multimedia Learning Theory
While text comprehension has been investigated rather intensively during the last 25 years (Graesser et al., 1997), research on comprehension of
visual displays has received much less attention. It is a special merit of the
authors of this special issue to address explicitly comprehension of visual
displays as spatial text adjuncts.
Former studies on text and picture comprehension focused primarily on
the mnemonic function of pictures in texts. The main result of these studies
was that text information is remembered better when it is illustrated by
pictures than when there is no illustration (Levie and Lentz, 1982; Levin et al.,
1987). Carney and Levin (this issue) emphasize that research throughout the
1990s has also demonstrated that carefully constructed pictures as visual text
adjuncts can not only have a decorative function, but also have functions of
representation, organization, interpretation, and mnemonic encoding (what
they refer to as a transformation function).
The facilitation of pictures on learning from text was usually explained
by Paivios dual coding theory (Clark and Paivio, 1991; Paivio, 1986). According to this theory, verbal information and pictorial information are processed in different cognitive subsystems: a verbal system and an imagery
system. Words and sentences are usually processed and encoded only in
the verbal system, whereas pictures are processed and encoded both in the
imagery system and in the verbal system. Thus, the high memory for pictorial information and the memory-enhancing effect of pictures in texts is
ascribed to the advantage of dual coding as compared to single coding in
memory. As Verdi and Kulhavy (this issue) point out, the mnemonic function of maps combined with texts can also be explained according to the
conjoint processing theory (Kulhavy et al., 1993). They emphasize the simultaneous availability of text information and pictorial information in working
memory: An intact map requires only little capacity of working memory
and therefore, leaves, enough capacity for processing of text information.
Thus, verbal information and pictorial information can be kept simultaneously in working memory and, accordingly, it is easier for the learner to
make cross-connections between the two different codes and later retrieval
information.
Mayer (1997) has developed a model of multimedia learning that combines the assumptions of dual coding theory with the notion of multilevel
mental representations. A main assumption of Mayers model is that verbal
and pictorial information are processed in different cognitive subsystems

P1: FYJ
Educational Psychology Review [jepr]

PP317-edpr-363440

November 24, 2001

8:41

Style file version Nov. 19th, 1999

108

Schnotz

and that processing results in the parallel construction of two kinds of mental
models that are finally mapped onto each other. Accordingly, an individual
understanding a text with pictures selects relevant words, constructs a propositional representation or text base, and then organizes the selected verbal
information into a verbal mental model of the situation described in the
text. Similarly, the individual selects relevant images, creates what is called a
pictorial representation or image base, and organizes the selected pictorial
information into a visual mental model of the situation shown in the picture.
The final step is to build connections through a one-to-one-mapping between
the text-based model and the picture-based model. Integrative processing
is most likely to occur if verbal and visual information are simultaneously
available in working memory, that is, the corresponding entities in the two
models are mentally available at the same time (Baddeley, 1992; Chandler
and Sweller, 1991).
An Integrative Model of Text and Picture Comprehension
The parallelism of text processing and picture processing assumed in
Mayers model is problematic, however, because texts and pictures are based
on different sign systems and use quite different principles of representation.
Thus, Schnotz and Bannert (1999) have proposed an integrative model of text
and picture comprehension that gives more emphasis to representational
principles (cf. Schnotz, 2001). An outline of this model is shown in Fig. 1.
It consists of a descriptive (left side) and a depictive (right side) branch
of representations. The descriptive branch comprises the (external) text,
the (internal) mental representation of the text surface structure, and the
propositional representation of the texts semantic content. The interaction
between these descriptive representations is based on symbol processing.
The depictive branch comprises the (external) picture, the (internal) visual
perception or image of the picture, and the (also internal) mental model of
the subject matter presented in the picture. The interaction between these
depictive representations is based on processes of structure mapping due to
the structural correspondences (i.e., analogy relations) between the representations (Gentner, 1989).
In text comprehension, the reader constructs a mental representation of
the text surface structure, generates a propositional representation of the semantic content, and constructs from this so-called text base a mental model
of the described subject matter (van Dijk and Kintsch, 1983; Schnotz, 1994;
Weaver et al., 1995). These construction processes are based on an interaction of bottom-up and top-down activation of cognitive schemata that have
both a selective and an organizing function. The selection of task-relevant
information is performed by top-down processing, whereas the organizing

P1: FYJ
Educational Psychology Review [jepr]

PP317-edpr-363440

November 24, 2001

Towards an Integrated View of Learning From Text and Visual Displays

8:41

Style file version Nov. 19th, 1999

109

Fig. 1. Schematic illustration of an integrative model of text and picture comprehension.

function is based on the interaction of bottom-up and top-down processing. This interaction results in a specific configuration of activated cognitive
schemata that fits best to the incoming information and organizes it into a coherent structure. Text information is processed with regard to morphologic
and syntactic aspects by verbal organization processes that lead to a mental

P1: FYJ
Educational Psychology Review [jepr]

110

PP317-edpr-363440

November 24, 2001

8:41

Style file version Nov. 19th, 1999

Schnotz

representation of the text surface structure. This text surface structure in


turn triggers conceptual organization processes, that result in a structured
propositional representation and a mental model.
Picture comprehension is based on a specific interplay between visual
perception and higher-order cognitive processing. In picture comprehension,
the individual first creates through perceptual processing a visual mental representation of the pictures graphic display. Then, the individual constructs
through semantic processing a mental model and a propositional representation of the subject matter shown in the picture. In perceptual processing, task-relevant information is selected through top-down activation of
cognitive schemata and then visually organized through automated visual
routines (Ullman, 1984). Perceptual processing includes identification and
discrimination of graphic entities, as well as the visual organization of these
entities according to the Gestalt laws (Wertheimer, 1938; Winn, 1994). The
resulting mental representation is the visual perception of the picture in the
imagery part of working memory, the so-called visual sketchpad (Baddeley,
1992; Kruley et al., 1994; Sims and Hegarty, 1997). Perception and imagery
are based on the same cognitive mechanisms, therefore, the same kind of
representation can also be referred to as a perceptual image if the representation is created on the basis of internal world knowledge rather than
external sensory data (Kosslyn, 1994; Shepard, 1984).
Semantic processing is required to understand a picture as opposed to
merely perceiving it. During this process, the individual constructs a mental model of the depicted subject matter through a schema-driven mapping
process in which graphic entities are mapped onto mental model entities
and spatial relations are mapped onto semantic relations. In other words,
picture comprehension is a process of analogical structure mapping between a system of visuo-spatial relations and a system of semantic relations
(Falkenhainer et al., 198990; Schnotz, 1993). This mapping can take place
in both directions; it is possible to construct a mental model bottom-up from
a picture, and it is also possible to evaluate an existing mental model topdown with a picture. While understanding pictorial illustrations or maps, the
individual can use cognitive schemata of everyday perception. While understanding graphs and knowledge maps, however, the individual requires
specific cognitive schemata (so-called graphic schemata) in order to be able
to read off information from the visuo-spatial configuration (Lowe, 1993;
Pinker, 1990).
When a mental model has been constructed, new information can be
read from the model through a process of model inspection. The new information gained in this way is made explicit by encoding it in a propositional
format. The new propositional information is used to elaborate the propositional representation. In other words, there is a continuous interaction between the propositional representation and the mental model (Baddeley,

P1: FYJ
Educational Psychology Review [jepr]

PP317-edpr-363440

November 24, 2001

Towards an Integrated View of Learning From Text and Visual Displays

8:41

Style file version Nov. 19th, 1999

111

1992). In text comprehension, the starting point of this interaction is a


propositional representation, which is used to construct a mental model.
When understanding pictures, the starting point of the interaction is a mental model which is used to read new information that is added to the propositional representation. Besides the interaction between the propositional
representation and the mental model, there may also be an interaction between the text surface representation and the mental model, and between
the perceptual representation of the picture and the propositional representation. This is shown in Fig. 1 by the dotted diagonal arrows.
Accordingly, there is no one-to-one relationship between external and
internal representations. A text as an external descriptive representation
leads to both an internal descriptive and an internal depictive mental representation. A picture, on the other hand, as an external depictive representation leads to both an internal depictive and an internal descriptive mental
representation.
Formally, one can consider the construction of a propositional representation and of a mental model as a kind of dual coding. Nevertheless,
my view is fundamentally different from the traditional dual coding theory.
First, dual coding presumably applies not only to the processing of pictures,
but also to the processing of words and texts. Second, the construction of a
mental model is regarded as more than simply adding a further code that
elaborates the mental representation and provides a quantitative advantage
compared to a single code. Rather, the essential point is that propositional
representations and mental models are based on different sign systems and
different principles of representation that complement one another.

Feature Information and Structure Information


Verdi and Kulhavy (this issue) point out that two kinds of information
can be distinguished in visual displays: feature information and structure
information. Feature information is provided by symbols or pictograms representing an external referent. It tells us what exists at a specific place. Furthermore, feature information helps to activate appropriate prior knowledge
schemata. Structure information is provided by the place where a specific
feature is located on the display. To summarize: Feature and structure information tell the observer what exists (or happened) where.
The distinction between what and where is also supported by results of
brain research and the practice of neurology. Visual information about object
attributes and visual information about spatial configurations is processed
in different cognitive subsystems: A what-system and a where-system. The
what-system contains knowledge about the appearance of objects and is
used to identify objects. The where-system contains knowledge about spatial

P1: FYJ
Educational Psychology Review [jepr]

112

PP317-edpr-363440

November 24, 2001

8:41

Style file version Nov. 19th, 1999

Schnotz

directions and distances between objects, that is, knowledge about spatial
structures and their location (Kosslyn, 1991). One can find patients with
partial brain damage who can localize objects but are unable to say what
these objects are, and one can find patients who can identify objects but
cannot localize them (Farah et al., 1988).
TOOLS FOR COMMUNICATION, THINKING, AND LEARNING
The articles in this special issue deal with various kinds of visual displays: static or animated pictures, geographic maps, thematic maps, graphs,
and knowledge maps. These visual displays can be considered as complex
pictorial signs, and like other kinds of signs, they can help to communicate information and support thinking or learning processes. Like sentences
of natural language (as complex verbal signs), these pictorial signs can be
analyzed under a syntactic, a semantic, and a pragmatic perspective. The
syntactic perspective deals with the well-formedness of signs. The semantic perspective deals with the meaning of the pictorial signs. The pragmatic
perspective deals with the use of pictorial signs in communication, thinking,
and learning.
Syntactic constraints on the well-formedness of visual displays derive
from the need to maintain similarity or a structural commonality with what
they represent and from the requirements of human perception. The syntactic constraints of pictorial illustrations and geographic maps are based on
similarity (cf. Carney and Levin, this issue; Verdi and Kulhavy, this issue).
The syntactic constraints of graphs and knowledge maps derive from the
conventional representation formats (e.g., pie charts, bar charts, line graphs,
scatter plots, and box plots), from structural commonalities with the represented subject matter, and from the mechanism of human visual perception,
especially the Gestalt laws. As ODonnell et al. (this issue) point out, knowledge maps that were constructed according to the Gestalt laws resulted in
better learning than other kinds of knowledge maps. Semantic constraints of
visual displays are implicitly addressed by all the contributors when they analyze conditions that make comprehension and learning with these displays
easier. Finally, all articles deal with the pragmatic perspective on pictorial
signs.
When Carney and Levin (this issue) distinguish between representation, organization, interpretation, transformation, and decoration as possible functions of pictorial illustrations, they refer to the pragmatic perspective.
Similarly, Verdi and Kulhavy (this issue) point out the facilitative function
of maps. As an amendment, it should be noted that both pictorial illustrations and maps can also serve as tools for thinking. An example is the use of
pictures by Abraham Wald during World War II: In order to find out which

P1: FYJ
Educational Psychology Review [jepr]

PP317-edpr-363440

November 24, 2001

Towards an Integrated View of Learning From Text and Visual Displays

8:41

Style file version Nov. 19th, 1999

113

areas of airplanes required more armor, he copied the bullet holes from
a large number of returning aircraft on an outline picture of the airplane
and put extra armor everyplace else (Wainer, 1992). Another example is
Dr John Snows use of a map of Central London in 1854, when he plotted
the location of deaths from cholera and found that the decease came from
the Broad Street water pump (Tufte, 1983). ODonnell et al. (this issue) also
report about the use of knowledge maps as a tool for communication and
thinking: These maps can be used, for example, in a counselling setting where
a counseller and a client try to attain a common understanding of a problem
situation. Similarly, graphs can be used both by novices and experts in order
to communicate about a problem. Shah and Hoeffner (this issue) emphasize
that graphs should generally be designed according to their intended usage.
If individuals should understand the interaction between three variables, for
example, one three-dimensional display would be better suited than there
exact values from a graph.

INDIVIDUAL DIFFERENCES
Visuo-spatial text adjuncts and other forms of visual displays can support communication, thinking, and learning only if they interact appropriately with the individuals cognitive system. Accordingly, the effects of
visuo-spatial adjunct aids depend on prior knowledge, cognitive abilities,
and learning skills. These factors are, of course, age-dependent. Children in
the kindergarten age range are generally skilled in understanding realistic
pictures, whereas verbal literacy (as result of learning to read and a prerequisite of reading to learn) is attained in primary school. Finally, visual literacy,
which includes understanding graphs, is acquired (if at all) still later (Shah
and Hoeffner, this issue).
Carney and Levin (this issue) point out that pictorial illustrations can
have a decorative and motivational function in materials for first graders
who learn to read. However, these pictures should not illustrate what children are expected to understand from reading the text. Individuals seem to
be experts in cognitive economy. They are therefore skillful in finding shortcuts for solving cognitive tasks. Generally speaking, one should not provide
alternative routes for understanding when the learner should be trained in
understanding a specific kind of representation.
Among readers, visual displays can have a supporting function for
understanding and learning difficult materials. The more difficult a learning content is, the higher is the learners frequency of looking at adjunct
visual displays (Carney and Levin, this issue). The supportive function of
visuo-spatial adjuncts seems to be especially evident with learners of low

P1: FYJ
Educational Psychology Review [jepr]

PP317-edpr-363440

November 24, 2001

114

8:41

Style file version Nov. 19th, 1999

Schnotz

prior knowledge and low verbal skills. Previous research has pointed out
that comprehension among learners with low domain knowledge (but sufficient visuo-spatial cognitive skills) is increased when pictures are added to a
text. Learners with high domain knowledge, on the contrary, are able to construct a mental model without pictorial support (Mayer, 1997). Carney and
Levin (this issue) draw a similar conclusion when they argue that a text that is
simple and can be easily envisioned by the learner does not need additional
pictures. However, if the subject matter is complex and/or if learners have low
prior knowledge, then visual displays increase comprehension. This is true
not only for pictorial illustrations, but also for knowledge maps. Knowledge
maps are especially helpful for learners with low prior knowledge and for
learners with low verbal skills (ODonnell et al., this issue).
Verdi and Kulhavy (this issue) also emphasize the role of prior knowledge: Learners with high prior knowledge better recall map information than
learners with low prior knowledge. However, maps are highly familiar both
for novices and for experts. Both groups show comparable ability in processing map information. Abstract kinds of visual displays such as graphs,
however, require knowledge about specific forms of representations. The
individual has to acquire specific cognitive schemata (graph-schemata) in
order to understand these so-called logical pictures (Pinker, 1990; Shah and
Hoffner,

this issue).

INSTRUCTIONAL CONSEQUENCES
Effective learning with visuo-spatial text adjuncts can be fostered
through instructional design by the teacher or author of instructional material and through adequate processing strategies by the learner. The contributions of this Special Issue include both of these perspectives.

Instructional Design
All contributors agree that effective learning with visuo-spatial text
adjuncts is not dependent on the professional appearance of visuals, but
rather on the relation between these displays and the task demands and on
the learners prior knowledge and cognitive abilities. Instructional design of
visual displays therefore requires sufficient understanding of how the human
cognitive system interacts with these displays.
The authors agree on various points with regard to instructional design:
First, if verbal and pictorial information is provided to learners, both kinds
of information should be coherent with some semantic overlap (Carney

P1: FYJ
Educational Psychology Review [jepr]

PP317-edpr-363440

November 24, 2001

Towards an Integrated View of Learning From Text and Visual Displays

8:41

Style file version Nov. 19th, 1999

115

and Levin, this issue; Shah and Hoeffner, this issue; Mayer and Moreno,
this issue). Second, both kinds of information should enter working memory simultaneously in order to make interconnections between them more
likely; simultaneous availability of information requires spatial or temporal contiguity (Mayer and Moreno, this issue). Third, semantic processing
of verbal and pictorial information requires activation of thematically related prior knowledge. Access to prior knowledge is facilitated in comprehending geographic or thematic maps and graphs if meaningful symbols,
colors, or icons are used that can be easily associated with their referent
(Verdi and Kulhavy, this issue). Access to prior knowledge is more difficult if a legend is used in a map or a graph because this requires an additional step in order to associate a color or a visual pattern with its external referent (Shah and Hoeffner, this issue). Fourth, if possible, verbal
and pictorial information should not enter working memory through the
visual channel in order to avoid cognitive overload. Fifth, the same verbal
information should not be presented simultaneously through the visual and
the auditive channel (Mayer and Moreno, this issue). There is also agreement that visual displaysranging from concrete pictorial illustrations to
abstract graphs or knowledge mapsshould be designed according to the
requirements of the human perceptual apparatus. They should, for example, include visual features that can be easily distinguished, and they should
arrange visual features according to the Gestalt laws. Finally, visual displays
should be designed according to the aim of communication or of teaching and
learning.
Sometimes, text and picture cannot be presented simultaneously. Based
on their own research, Verdi and Kulhavy (this issue) suggest that in this case
the picture should be presented first and the text later. The authors argue that
when text processing occurs first most of the capacity of working memory is
used leaving little capacity for processing the following picture. Processing a
picture first requires little space in working memory and, thus, leaves enough
capacity for processing text. Thereafter, I believe that an alternative and
probably more simple explanation would be that a text never describes a
subject matter with enough detail to allow only one kind of envisioning. A
mental model or visual image constructed only from the text is therefore
likely to differ from the picture presented afterwards and, thus, interferes
with the picture (cf. Fig. 1). This kind of interference can be avoided by
presenting the picture before the text.
Processing Strategies
Visual displays can support communication, thinking, and learning.
However, they do not provide this support automatically. Learners often

P1: FYJ
Educational Psychology Review [jepr]

116

PP317-edpr-363440

November 24, 2001

8:41

Style file version Nov. 19th, 1999

Schnotz

underestimate the informational content of pictures and believe that a short


look would be enough for understanding and for extracting the relevant information (Mokros and Tinker, 1987; Weidenmann, 1989). These individuals
do not engage in a schema-driven analysis of the depictive representation,
do not read enough information, and thus do not elaborate their propositional representation of the subject matter sufficiently (cf. Fig. 1). Thus,
it is not enough that learners possess the cognitve schemata of everyday
knowledge required for understanding pictorial illustrations or the cognitive
schemata required for understanding graphs. These schemata also must be
activated (Shuell, 1988).
All articles contained in this special issue emphasize the importance of
active cognitive processing of visuo-spatial adjuncts requiring appropriate
processing strategies. Carney and Levin (this issue) show that the functions
of representation, organization, interpretation, and transformation require
appropriate encoding. Similarly, animations are only beneficial for learning
if the individual engages in active cognitive processing (Mayer and Moreno,
this issue). Shah and Hoeffner point out that learners need visual literacy.
They emphasize that individuals have difficulty mapping one form of representation into another one, and argue that learning from graphs should be
considered a special metacognitive task. The authors also make suggestions
how individuals could attain visual literacy. Several authors also emphasize
that learners should operate with visual displays in an active way that results
in a controllable product.
The effectiveness of knowledge maps also seems to result from active
cognitive processing. Knowledge maps require that learners use a restricted
set of semantic relations: The individual has to subsume specific semantic
relations from the text under one of the higher-order relations provided
in knowledge maps. This requires deeper semantic processing than simply
copying the name of a semantic relation from the text to a link in the conceptual map. ODonnell et al. (this issue) report that a spatial display of a
knowledge map organized in a leftright order can by mistake trigger a simple text reading strategy. In this case, the knowledge map can be harmful for
learning because the map is processed too superficially. The authors show
that active construction of knowledge maps by learners helps them to reflect about the material more deeply and enables them to communicate the
content more clearly. The use of knowledge maps requires specific strategies
that must be learned. It is remarkable that even brief training frequently
results in positive transfer even in learning situations where no knowledge
maps are available. One can assume that such training helps learners focus
on the structural aspects of knowledge even without an external knowledge
map. In other words, learners acquire a general ability to structure and organize knowledge as a general cognitive tool.

P1: FYJ
Educational Psychology Review [jepr]

PP317-edpr-363440

November 24, 2001

Towards an Integrated View of Learning From Text and Visual Displays

8:41

Style file version Nov. 19th, 1999

117

FURTHER PERSPECTIVES
The articles contained in this special issue show that visual displays are
powerful devices to support teaching and learning as well as other kinds
of communication. There is converging evidence that specific principles of
designing visuals and of combining them with texts are important to support
comprehension and learning. There is also converging evidence that prior
knowledge about representation formats and active processing of visuals
based on adequate strategies are crucial for effective support of comprehension and learning. Nevertheless, there are still a number of open questions
that require investigation.
Learning from verbal and pictorial information has generally been considered as (potentially) beneficial for learning. However, research on knowledge acquisition from multiple representations has made obvious that the
use of more representational formats does not only have cognitive benefits but also cognitive costs (Ainsworth, 1999). Learning from verbal and
pictorial information has also frequently been associated with individual
representational preferences and cognitive styles. Examples of this are the
distinction between visualizers and verbalizers and between field independency and field dependency (cf. Verdi and Kulhavy, this issue). Research
on the relevance of such preferences and cognitive styles, however, has not
attained clear results yet, and it is unknown whether matching the learners
individual preferences really will result in better learning. Accordingly, it
remains an open question whether we need to adapt texts and visuo-spatial
adjuncts to the assumed aptitudetreatment effects hypothesized by some
researchers.
The development of new technologies is a specific challenge for the
use of verbal and pictorial information in learning and instruction. While
traditional print material allows only static visual displays to be presented,
computer-based instruction makes it possible to show animated displays.
Many practitioners and researchers consider animation an ideal form for
presenting change and development. Empirical results, however, do not
generally support this assumption. Further research on the conditions for
using animations effectively is required. This research should be based on
a well supported cognitive theory (cf. Mayer and Moreno, this issue). The
development of new technologies also casts some well-known kinds of visual displays into a new light. Knowledge maps, for example, can be used
not only as schaffolds for generating semantic macrostructures, they can also
be used as external visual models of an information space. Thus, tools for
communication, thinking, and learning also become tools for information
search. Despite these developments, I doubt whether we have to repeat the
research on learning from verbal and pictorial information with print media

P1: FYJ
Educational Psychology Review [jepr]

PP317-edpr-363440

November 24, 2001

118

8:41

Style file version Nov. 19th, 1999

Schnotz

under the conditions of the new electronic media. I also doubt that the design
principles for the new media will be fundamentally different from the design
principles developed for the traditional print media. The essential point in
this context is whether there are really new qualities emerging from the use
of new technologies that are relevant for cognitive processing.
Another essential point is whether and in what respect learners might
differ in the future from todays learners. The general constraints of the
human cognitive system will certainly not change as a result of new technologies. However, future learners could have new attitudes and processing
habits. As humans are exposed to an increasing mass of information that
frequently dazzles the eyes, ears, and mind, new standards of presenting information emerge. For example, television stations present short, dynamic,
and entertaining information sequences, and most mass media provide an
increasing amount of pictorial information that allow easy and rapid information processing. One can assume that learners who have much experience
with electronic media and with new kinds of information presentation might
have new expectations, new attitudes, and new processing habits that affect
their cognitive processing.
Cognitive processing, however, is only one factor that contributes to
effective learning. Affective and motivational factors must be considered
as well. If new media have appeal for young learners and if these learners are motivated to interact with a computer-based learning environment
longer than with traditional print materials (because it is more fun), then
this could justify the use of new technologies even when the cognitive effects would be about the same as with traditional print media. Research on
learning from text with visuo-spatial adjuncts will have to be conducted not
only from a cognitive, but also from an affective, motivational, and social
perspective to reach adequate educational decisions.
REFERENCES
Ainsworth, S. (1999). The functions of multiple representations. Comput. Educ. 33: 131152.
Baddeley, A. (1992). Working memory, Science 255: 556559.
Chafe, W. L. (1994). Discourse, Consciousness, and Time, University of Chicago Press, Chicago.
Chandler, P., and Sweller, J. (1991). Cognitive load theory and the format of instruction. Cogn.
Instr. 8: 293332.
Clark, J. M., and Paivio, A. (1991). Dual coding theory and education. Educ. Psychol. Rev. 3:
149210.
Falkenhainer, B., Forbus, K. D., and Gentner, D. (198990). The structure-mapping enginge:
Algorithm and examples. Artif. Intell. 41: 163.
Farah, M. J., Hammond, K. M., Levine, D. N., and Calvanio, R. (1988). Visual and spatial mental
imagery: Dissociable systems of representation. Cogn. Psychol. 20: 439462.
Garrod, S. C. (1985). Incremental pragmatic interpretation versus occasional inferencing during
fluent reading. In Rickheit, G., and Strohner, H. (eds.), Inferences in Text Processing, NorthHolland, Amsterdam, pp. 161181.

P1: FYJ
Educational Psychology Review [jepr]

PP317-edpr-363440

November 24, 2001

Towards an Integrated View of Learning From Text and Visual Displays

8:41

Style file version Nov. 19th, 1999

119

Gentner, D. (1989). The mechanisms of analogical learning. In Vosniadou, S., and Ortony,
A. (eds.), Similarity and Analogical Reasoning, Cambridge University Press, Cambridge,
England, pp. 197241.
Graesser, A. C., Millis, K. K., and Zwaan, R. A. (1997). Discourse comprehension. Annu. Rev.
Psychol. 48: 163189.
Johnson-Laird, P. N. (1983). Mental Models. Towards a Cognitive Science of Language, Interference, and Consciousness, Cambridge University Press, Cambridge, England.
Johnson-Laird, P. N., and Byrne, R. M. J. (1991). Deduction, Erlbaum, Hillsdale, NJ.
Kintsch, W., Welsch, D., Schmalhofer, F., and Zimny, S. (1990). Sentence memory: A theoretical
analysis. J. Mem. Lang. 29: 133159.
Kosslyn, S. M. (1991). A cognitive neuroscience of visual cognition: Further developments. In
Logie, R. H., and Denis, M. (eds.), Mental Images in Human Cognition, North-Holland,
Amsterdam, pp. 351381.
Kosslyn, S. M. (1994). Image and Brain. The Resolution of the Imagery Debate, MIT Press,
Cambridge, MA.
Kruley, P., Sciama, S. C., and Glenberg, A. M. (1994). On-line processing of textual illustrations
in the visuospatial sketchpad: Evidence from dual-task studies. Mem. Cogn. 22: 261272.
Kulhavy, R. W., Stock, W. A., and Kealy, W. A. (1993). How geographic maps increase recall of
instructional text. Educ. Technol. Res. Dev. 41: 4762.
Larkin, J. H., and Simon, H. A. (1987). Why a diagram is (sometimes) worth ten thousand
words. Cogn. Sci. 11: 6599.
Levie, H. W., and Lentz, R. (1982). Effects of text illustrations: A review of research. Educ.
Commun. Technol. J. 30: 195232.
Levin, J. R., Anglin, G. J., and Carney, R. N. (1987). On empirically validating functions of
pictures in prose. In Willows, D. M., and Houghton, H. A. (eds.), The Psychology of Illustration, Vol. 1, Springer, New York, pp. 5186.
Lowe, R. K. (1993). Constructing a mental representation from an abstract technical diagram.
Learn. Instr. 3(3): 157179.
Mayer, R. E. (1997). Multimedia learning: Are we asking the right questions? Educ. Psychol.
32: 119.
Mokros, J. R., and Tinker, R. F. (1987). The impact of microcomputer based labs on childrens
ability to interpret graphs. J. Res. Sci. Teach. 24(4): 369383.
Paivio, A. (1986). Mental Representations: A Dual Coding Approach, Oxford University Press,
Oxford, England.
Palmer, S. E. (1978). Fundamental aspects of cognitive representation. In Rosch, E., and Lloyd,
B. B. (eds.), Cognition and Categorization, Erlbaum, Hillsdale, NJ, pp. 259303.
Peirce, C. S. (1906). Prolegomena to an apology for pragmaticism. Monist 492546.
Peterson, D. (1996). Forms of Representation, Intellect, Exeter.
Pinker, S. (1990). A theory of graph comprehension. In Freedle, R. (ed.), Artificial Intelligence
and the Future of Testing, Erlbaum, Hillsdale, NJ, pp. 73126.
Schmalhofer, F., and Glavanov, D. (1986). Three components of understanding a programmers
manual: Verbatim, propositional, and situational representations. J. Mem. Lang. 25: 279
294.
Schnotz, W. (1993). On the relation between dual coding and mental models in graphics comprehension. Learn. Instr. 3: 247249.
Schnotz, W. (1994). Auf bau von Wissenstrukturen. Untersuchungen zur Koharenzbildung beim
Wissenserwerb mit Texten, Pschologie Verlags Union, Weinheim.
Schnotz, W. (2001). Sign sytems, technologies, and the acquisition of knowledge. In Rouet, J. F.,
Levonen, J., and Biardeau, A. (eds.), Multimedia LearningCognitive and Instructional
Issues, Elsevier, Amsterdam, pp. 929.
Schnotz, W., and Bannert, M. (1999). Support and interference effects in learning from multiple
representations. In Bagnara, S. (ed.), European Conference on Cognitive Science, 27th30th
Oct. 1999, Istituto di Psicologia Consiglio, Nazionale delle Ricerche, Rome, Italy, pp. 447
452.
Shepard, R. N. (1984). Ecological constraints on internal representations: Resonant kinematics
of perceiving, thinking, and dreaming. Psychol. Rev. 91: 417447.

P1: FYJ
Educational Psychology Review [jepr]

120

PP317-edpr-363440

November 24, 2001

8:41

Style file version Nov. 19th, 1999

Schnotz

Shuell, T. J. (1988). The role of the student in the learning from instruction. Contemp. Educ.
Psychol. 13: 276295.
Sims, V. K., and Hegarty, M. (1997). Mental animation in the visuospatial sketchpad: Evidence
from dual-tasks studies. Mem. Cogn. 25: 321332.
Tufte, E. R. (1983). The Visual Display of Quantitative Information, Graphics Press, Cheshire,
CT.
Ullman, S. (1984). Visual routines. Cognition 18: 97159.
Van Dijk, T. A., and Kintsch, W. (1983). Strategies of Discourse Comprehension, Academic
Press, New York.
Wainer, H. (1992). Understanding graphs and tables. Educ. Res. 21(1): 1423.
Weaver, C. A., III, Mannes, S., and Fletcher, C. R. (eds.). (1995). Discourse Comprehension,
Erlbaum, Hillsdale, NJ.
Weidenmann, B. (1989). When good pictures fail: An information-processing approach to the
effects of illustrations. In Mandl, H., and Levin, J. R. (eds.), Knowledge Acquisition From
Text and Pictures, North-Holland, Amsterdam, pp. 157170.
Wertheimer, M. (1938). Laws of Organization in Perceptual Forms in a Source Book for Gestalt
Psychology, Routledge & Kegan Paul, London.
Winn, W. D. (1994). Contributions of perceptual and cognitive processes to the comprehension
of graphics. In Schnotz, W., and Kulhavy, R. (eds.), Comprehension of Graphics, Elsevier,
Amsterdam, pp. 327.