Sunteți pe pagina 1din 10

The Cognitive Semiotics of Virtual Reality

Alison McMahan, Ph.D. Warren Buckland, Ph.D.

Abstract
To advance to the next generation of interaction in virtual reality (including 3-D gaming), we need to rethink both
how the interface and the VR environments themselves are designed. McMahan is currently conducting an
experiment, The Memesis Project, on how the sense of presence is altered if a 3-D CAVE environment responds to
the user’s subconscious cues as well as conscious ones. The prototype version of Memesis is designed to resemble a
haunted house that collects information about the user’s phobias and deep-seated psychological fears to provide an
ultimate, more thrilling “haunted house” experience. The biofeedback interface is based on the cognitive principle
that our perceptions as well as our emotions are tied to our bodies. To maximize the dramatic possibilities of the
biofeedback interface we need a new design template, a new set of aesthetic principles, to guide our VR
environment design. This paper applies Warren Buckland’s cognitive-semiotic theory of orientation in film space to
virtual reality, specifically to Memesis. Buckland examines how films simulate spatial and perceptual cues that
enable spectators to mentally orient their body in relation to a film’s fictional environment, and, by extension,
enables them to react emotionally with that environment. McMahan then examines how VR environments intensify
this process of spatial simulation and orientation, as a first step towards applying Buckland’s cognitive-semiotic
approach to emotional engagement in VR.

1 Introduction
In his paper “Orientation in Film Space: A Cognitive Semiotic Approach”, Warren Buckland explores “the relation
between literal and fictional space in fiction films, as well as the spectator’s literal and imaginative orientation in
relation to fictional entities (particularly the spectator’s location in relation to the fictional geography) (Buckland,
2003, 98). Buckland predicts that his cognitive-semiotic theory of film will have various applications beyond film,
including the analysis of the user’s interface with virtual reality systems. The goal of this paper is to test Buckland’s
claim, first with the issue of orientation in immersive virtual reality environments such as CAVEs, and then to look
ahead to the application of his theories to emotional engagement in virtual reality.

To begin with, Buckland examines how films simulate spatial and perceptual cues that enable spectators to mentally
orient their body in relation to a film’s fictional environment. He begins his paper by arguing (in contradiction to
Currie (1995, xii)) that two levels of “make-believe” are necessary for a fiction to be perceived as fiction: spectators
must “take an attitude of make-believe to the events of the fiction… and to the perceptual relations they establish to
those events” (Buckland, 2003, 89). He refers to the Participatory Thesis, or the Imagined Observer Hypothesis
(IOH), which states that, to comprehend the film and to become emotionally engaged, spectators must imagine they
are an observer who is “inside the fiction”.

As the user navigates through the space of the game they must orient themselves. Two fundamental questions we
need to ask about orientation are: What space? And: what type of orientation? The two types of spaces to which
Buckland refers are real space and its cognitive representation – imaginative space. What type of orientation? There
are three basic types: absolute, intrinsic, and contextual. Absolute orientation refers to cardinal directions (North,
South, East, West), which collectively represent an objective frame of reference within global space, therefore
enabling absolute orientation. Intrinsic orientation originates from the properties of objects (e.g., their symmetry)
and the frame of reference for intrinsic orientation in a 3-D space consists of three axes: up/down, front/back, and
left/right. Finally, in contextual orientation, the frame of reference consists of two relative points, for example a
speaker and objects or events to which he or she refers. This type of orientation is called deictic (which should not
be limited to verbal language).
Buckland discusses contextual orientation in both real and imagined spaces or, more particularly, contextual
orientation in the real and imagined spaces of fiction films (for fiction films combine at the same time real and
imagined spaces). He was influenced by Karl Bühler, who in the 1930s developed a situational model of action,
which involves studying the way individuals are embedded in an environment and how they meaningfully (rather
than randomly) interact with it. For Bühler, the fundamental type of interaction is orientation. Individuals are not,
therefore, passively placed in space, but are actively engaged with, immersed in, and interact with it. One’s very
presence in a space creates a form of contextual orientation, although the orientation process is not complete until
the individual establishes a coordinated relationship with his environment and mentally represents it. Moreover, it is
through vision by which humans primarily orient themselves in real space and create mental representation of that
space (although other senses, such as hearing and touching, play a lesser role). This is why Bühler calls orientation
in real, physical spaces “ocular deixis” (Bühler, 1990, Chapter 8).

Bühler argues that deictic terms can be used when the individual is coordinated to an imagined visual space. In
opposition to ocular deixis (the orientation of the individual in a real, physical space), Bühler calls orientation in
imagination “imagination-oriented deixis.” Ocular deixis and imagination-oriented deixis can also be combined, as
we will see below, in the discussion of Jeffrey Shaw’s work.

2 Kinesthetic Image Schemas


How is it possible for a spectator to, first, accept the 2-D images on the screen as 3-D, and second, to imagine
herself as an observer inside that 3D space? In his book The Cognitive Semiotics of Film (Buckland, 2000),
Buckland answers this question by turning to the work of cognitive semanticist George Lakoff, who holds an
experientialist view of cognition, which is based on the premise that thought and language are fundamentally
motivated by bodily experience. Objectivists believe that knowledge is represented in the mind by propositions
consisting of meaningless symbols (which gain their meaning via a direct correspondence to external reality).
Cognitive semantics posits that knowledge is represented in the mind in the form of schemata (cognitive structures
that organize perceptual input into experiences). More specifically, cognitive semanticists such as George Lakoff
and Mark Johnson have posited a series of “Kinesthetic Image Schemas” (Lakoff, 1987, Chapter 17) that structure
perceptual input into experiences. These schemata are inherently meaningful because they gain their meaning
directly from the body’s innate sensory-motor capacities. Kinesthetic Image Schemata therefore represent “the body
in the mind” (to use the title of Johnson’s book) (Johnson, 1987) and are posited as being cognitively real because
they are directly motivated or non-arbitrary, and inherently meaningful. Cognitive semantics therefore challenges
the dualism (the mind-body problem) of Cartesian philosophy.

Lakoff and Johnson define embodiment as “our collective biological capacities and our physical and social
experiences as beings functioning in our environment” (Lakoff, 1987, 266-267). Kinesthetic image schemata are
simple structures that arise from the body – up-down, back-front, centre-periphery, part-whole, inside-outside, paths,
links, forces, and so on. These schemata are directly constrained by the dimensions of the human body. And because
the dimensions of the fully-grown human body are shared, fairly uniform and constant, any discussion of conceptual
structure in terms of the body does not fall into radical relativism and subjectivism. Kinesthetic image schemata are
not, therefore, arbitrary, but are directly motivated by a shared and constant bodily experience. The structure of our
shared bodily experience then becomes the basis for rational, abstract thought by means of image based schemata
and creative strategies such as metaphor and metonymy, which project and extend this structure from the physical
domain into the abstract domain of concepts.

Lakoff outlines several kinesthetic image schemata and shows how they determine the structure of abstract
conceptual thought:
• The container schema, which structures our fundamental awareness of our bodies, is based on the elements
“interior”, “boundary”, and “exterior”. In terms of metaphorical extension, Lakoff notes that the visual field
is commonly conceived as a container, since things “come into” and “go out of” sight. Also our
understanding of our emergence out of various states of mind (drunkenness, dizziness, and s on) is also a
metaphorical expression of the container schema.
• The part-whole schema, also fundamental to our self-awareness, is based on our perception of our bodies as
wholes made up of parts. As an example of metaphorical extension, Lakoff mentions that families are
understood in terms of the part-whole schema.
• The link schema is based on our awareness of our position in relation to others, in which this sense of
position enables us to establish a link between self and other. Social and interpersonal relations are
metaphorically conceived in terms of “making connections” or “breaking social ties”, whereas freedom is
understood as the absence of any links “tying us down.”
• The center-periphery schema is similar to the part-whole schema in that it is based on our awareness of our
bodies as having centers (the trunk and the internal organs) and peripheries (limbs, hair, etc.). The center
gains its importance from the basic fact the preservation of the body is more important to survival than the
peripheries.
• The source-path-goal schema is based on our experience of bodily movement in a particular direction from
one point to another along a path. Its many metaphorical extensions include its structuring of one’s long-
term aims and ambitions, which become “sidetracked” or “blocked” by obstacles.
• The balance schema is based on our bodily experience of ourselves as a symmetry of forces relative to an
axis. This is why symmetry and balance are pleasing in artworks: because they imitate the symmetry and
balance of the perceiver’s body (Buckland, 2000 40-44, Lakoff, 1985 269-275).

To show how these schemata relate to film, Buckland applies the container schema to the film frame and diegesis
(the story world of the film). To argue that frame and diegesis are understood in terms of a kinesthetic image schema
is to suggest that they are comprehended in terms of our experience of our bodies as containers. The frame in
particular is analogous to (or is a reduplication of) sight itself, which, as Lakoff remarks, is also understood in terms
of the container schema.

The term ‘diegesis’ was first introduced into film studies by Etienne Souriau, who distinguished seven levels of
filmic reality:
• Afilmic reality (the reality that exists independently of filmic reality)
• Profilmic reality (the reality photographed by the camera)
• Filmographic reality (the film as physical object, structured by techniques such as editing)
• Screenic (or filmopathic) reality (the film as projected on a screen)
• Diegetic reality (the fictional story world created by the film)
• Spectatorial reality (the spectator’s perception and comprehension of a film)
• Creational reality (the filmmaker’s intentions) (Souriau, 1951, 231-240).

Souriau’s discussion of the way these levels of reality relate to one another can be reformulated in terms of the
container and in-out schemata. The in-out schema delimits the first boundary – between afilmic and profilmic
reality. The afilmic exists outside the realm of the cinema, whereas the profilmic exists inside. But both these levels
are then defined as existing outside the filmic text. This establishes a new boundary – between extra-textual and
textual reality. Whereas the afilmic and profilmic are extratextual, the following three types of filmic reality are
textual and are structured in relation to one another in terms of the container schema: The filmographic contains the
screenic, and the screenic contains the diegesis. The final two levels of filmic reality are cognitive, referring to the
spectator’s comprehension of film and the film as conceived by the filmmaker (Buckland, 2000, 47).

In Souriau’s terms, the frame exists on both the filmographic and screenic levels of filmic reality, and serves as a
container for the diegesis. But the relation between frame and diegesis in the cinema is complex. In painting and
film, the frame serves as a boundary between what is seen and what remains unseen. Nonetheless, there is a
fundamental difference between the function of the frame in painting and in film. In painting, the frame acts as an
absolute boundary; it unequivocally severs the bounded space from its surroundings. But in cinema, the frame is
mobile. The boundary between bounded and unbounded space is equivocal. In filmic terms, there is an opposition
between on-screen space and off-screen space. On-screen and off-screen space are themselves understood in terms
of containment, since they contain the film’s diegesis, or fictional story world. So the container schema operates at
two levels of the film - the frame (which acts as a boundary between on-screen and off-screen space) and the
diegesis (which acts as a boundary between fiction and non-fiction). Off-screen space has an unusual semiotic
specificity, since it exists between the filmographic level and screenic level of filmic reality. It is an invariant
property of filmographic reality, but a variant property of screenic reality. In these terms, we can think of it as the
potential, non-manifest stage of on-screen space.

Off-screen space is therefore that part of the first container (the diegesis) that does not appear on screen and in frame
(= the second container) at any one moment. The space of the auditorium can then be considered as the third and
ultimate container. Although visual stimuli are contained within the frame and diegesis, sound is only contained by
the third container, the auditorium. The auditorium is a container that contains the screen and frame. The frame is a
container that contains the diegesis, the fictional story world (characters, settings, and actions). The auditorium,
frame, and diegesis are container objects; what they contain are container substances - the auditorium contains the
screen and frame (filmographic and screenic reality); the screen and frame contain the diegesis; and the diegesis
contains the fictional story world.

The frame can be comprehended in more specific ways as well. For example, whereas the framing of an objective
shot is comprehended as representing the vision of a non-diegetic narrator, a subjective shot is comprehended as
embodied – as representing the vision of a character existing in the diegesis. The frame in a subjective shot is
therefore comprehended as being controlled within the diegesis, whereas the frame of an objective shot is perceived
as disembodied and non-diegetic – but which is embodied in the film spectator’s body. Of course, the subjective
shot is also ultimately embodied in the film spectator, but that embodiment is duplicated in the film’s diegesis.

The narrative film in itself is understood as a container, as is evident in everyday comments people make after
seeing a film – ‘there was not much in the film’ (the evaluation of a film is judged in terms of quantity – the more it
contains, or the more that happens, the better); ‘I became immersed in (or absorbed into) the film’ (here the spectator
metaphorically views herself as an object drawn into the film as container), and so on. Stephen Heath (1981)
employs the term suture to describe the process that ‘stitches’ the spectator into an imaginary relation to the filmic
image. By contrast, from a cognitive semantic perspective we can argue that the spectator’s projection into the film
is in fact the result of the container schema. This schema, directly motivated by the spectator’s body, is
metaphorically projected onto the film by the spectator in her ongoing construction of a mental representation of
each space in the film’s diegesis as a coherent container. The spectator uses the various shots in the scene to
construct a mental representation of the space – say, a room with four people sitting around a table – where the
filmed action is taking place. Once the spectator has succeeded in constructing the space in her mind, she is
‘absorbed’ into that space. Certain shot sequences, such as breaking the 180 degree rule (or “crossing the line” that
divides the camera’s range of angles from the action in classical Hollywood film style), will give the spectator
information that might contradict the kinesthetic image of that room that she has constructed in her mind. Depending
on the nature of these contradictory shots, the spectator might accept them as anomalous or might have to revise her
mental schema of that room.

3 The Schemas and Virtual Reality

3.1 The Container Schema


Immersive virtual reality environments, such as CAVEs™, are more literal embodiments of our projection of the
container schema than the cinema. CAVEs ™ are rooms of 3 x 3 x 3 meters with 3D images projected on floor and
walls and using surround sound – in other words, the “containers” of movie theatre auditorium and movie screen
have been blended, overlapped, collapsed into each other. While the film spectator has to construct a container
schemata of the room with four people sitting in it in his mind, with only limited cues from the film to go by, a range
of 3-D computer software from CAD programs to immersive environments such as CAVEs™ enable spectators to
actually “walk” through a space. By the same token, someone, such as an architect, who has a mental schema of a
room in their mind and wants to make it available to others can use the same programs to create a version of that
space in the computer. We feel that film theory can be usefully applied to virtual reality because the first person or
egocentric perspective of VR environments, such as CAVEs™, eliminate one level of container projection
demanded by the viewer of the film. Instead of having to translate the 2-D screen of the film to a 3-D mental space,
as the cinema spectator does, VR users simply steps into a CAVE™, turns on the computer, and puts on their 3-D
goggles. In other words, what Souriau called the screenic or filmographic level of filmic reality is no longer a
kinesthetic mental projection of the spectators, but a materialized container that the user is immersed in. Therefore
virtual reality (at least, VR with an egocentric, or filmic, perspective) is a natural next phase to the project of
cinema.

In film, the 2-D screen simulates a 3-D space through various cinematic tools used to give the spectator a sense of
being absorbed in the diegesis. 3-D, navigable spaces are much further along the continuum between simulation and
copy (though of course we are nowhere near achieving the “ideal” of the holodeck). The aspects of 3D modeling
that we are so familiar with are designed to further our sense of the projection being a volumetric space. In polygon-
only systems such as those used with most 3-D VR environments, these include a mesh level of detail, scale, height
maps, textures, colors, lighting, and depth cueing (Morris and Hartas, 2003, 140-145). Often the worlds projected in
3D are themselves in the shape of containers: how many of us have modeled a dome for a sky over a flat grid for a
terrain? Or hidden a little world inside a spherically shaped sky? And so, the 2-D flat screen of the cinema in a 6-
wall cave or with a VR helmet is turned into a screen-as-container; and within that physical container exist a
multiplicity of graphically rendered containers, within which the action of the game or VR environment takes place.

In many of his VR art installations Jeffrey Shaw (Duguet, Klotz & Weiberl, 1997) has worked around this concept,
in such as way that we are forced to reconsider the mental schemas we take for granted. For example, his museum
installation, called Viewpoint (1975), provided the museum visitor who looked through a viewing console a real
view of the room in front of the console as well as virtual events that had been filmed earlier in that same space. A
similar principle applied in Alice’s Room (1989) in which the viewer used a joystick to rotate a large monitor on a
360 platform; the virtual space (the digital image on the screen) and the real space (the museum room) were
optically aligned so that the viewer facing a door or a window in the real room would also be facing the same
features in the simulation. In other works Shaw abandons the idea of including the real museum space and instead
treats the screen as a moving window that the user can rotate to get different views of a virtual world. In Eve
(Extended Virtual Environment) (1993), two projectors are mounted on a robot arm with a pan and tilt device inside
an inflated dome with bare white walls. The images are projected in stereo and the user wears goggles to see the
projected images in 3-D. A tracking device on the user’s helmet is what cues the robot arm on where to put the
screen window. In other words, the entire inside of the dome is a virtual world, but the user can only see through a
movie-screen like window at any given moment. Reviewing the work of Jeffrey Shaw gives us new insight into how
container schemas are duplicated in our current visual media. But container schemas are not the only schemas of
interest here. Buckland applied Lakoff’s five other kinesthetic image schemas to film, and to varying degrees they
are of interest for virtual reality as well.

3.2 The Part-Whole Schema


In terms of the part-whole schema, a narrative film is of course understood to be a whole made up of parts – shots
and scenes. One of the main issues addressed by film semiotics was to identify these parts by means of segmentation
and classification, such as Christian Metz’s Grande Syntagmatique film segmentation system (Metz, 1974).
Machinamists, that is, animators who make films using 3-D game engines such as Quake, Unreal, or Halflife, are
evolving such a part-whole schema for 3-D spaces by treating machinima films as an outgrowth of cinema, though
the opportunities and limitations of Machinima are leading to certain changes. For example a 360 degree pan, or
circling camera, difficult to achieve on a film set, is very easy to do in Machinima and as a result tends to be
overused. Slow motion (“slomo”) is necessary to make action sequences read clearly, but in films it is usually used
for emotional emphasis (and it is generally avoided in customized game levels). Fades-to-black are used in between
scene changes, not to indicate the passage of time as it does in traditional films but to cover up the fact that a new
digital map is being loaded. Instead of talking about POV or depth of field, Machinima filmmakers talkabout Field
of View (FOV). A wide FOV, 90, comparable to that of a fish-eye lens, is typical for mod applications, but 52 is
closer to a standard cinematic depth of field (25mm).

Machinima tutorials (www.machinima.com) encourage machinamists, they need to choose between the visual
conventions of first person shooters and cinema, to choose the conventions of cinema. The texts most recommended
are Steven Katz’s Film Directing: Shot by Shot and Film Directing: Cinematic Motion, and Daniel Arijon’s
Grammar of The Film Language. These are well-known technical manuals on film directing (McMahan, 2005,
Chapter 2). To a filmmaker this feels like a missed opportunity: what is most attractive about the possibilities of
machinima is its promise to escape from the limitations of the classical Hollywood cinematic language, just as the
promise of VR is to escape from the limitations of the 2-D screen. However, machinima is still in its early days,
used primarily to make short films that serve as portfolio examples for animators or for cutscenes in games. It is safe
to predict that as machinima becomes more widespread (and as open-source engines expand the its commercial
possibilities) it will evolve its own way of relating its parts to its whole.

3.3 The Link Schema


Relating the parts to the whole is actually described by Lakoff as a “link” schema. Humans are born attached to their
mothers through an umbilical cord, and continue to hold onto, or be held by, family members throughout infancy
and childhood. This early experience of being linked becomes a mental schema, so that social and interpersonal
relationships are understood in terms of being linked, and divorce is seen as “splitting up”. The linking schema also
works with any kind of structural elements, such as two buildings with a walkway that connects them.

The part-whole schema is closely aligned to the link schema, since a collection of parts can only be perceived as a
whole if the parts are linked together coherently. Many continuity editing techniques achieve this impression of
coherence, and can therefore be understood in terms of the link schema. Some shots are central to a scene, such as
the establishing shot, while others are less central, such as an insert shot, or a close-up of an important detail. While
early machinima practice consisted of emulating the linking patterns and hierarchies of the cinema, when it comes to
immersive technologies the linking schema is most applicable to the interface. It is the interface – in a CAVE, the
interface are the screens, the goggles, the glove and/or the wand – that enables the user to bridge the gap between the
a-virtual reality of “meatspace” to the virtual reality of “cyberspace”. Meatspace is the level of virtual fiction most
comparable to Souriau’s Afilmic reality (the reality that exists independently of filmic reality) and cyberspace is
most akin to Souriau’s Profilmic reality (the reality photographed by the camera). The virtual fiction equivalent of
what Souriau called Filmographic reality (the film as physical object, structured by techniques such as editing) is for
virtual reality users (and computer game players who use less immersive technologies such as consoles hooked up to
television sets or desktop computers) already a level of engagement in the virtual fictional world.

Edward Branigan has developed a sophisticated theory of narrative agents and levels of narration in the cinema. He
defines narrative as a cognitive activity, for it organizes data into patterns that represent experience (Branigan,
1992). He then identifies eight levels of narration, the top three (historical author, extra-diegetic narrator, and non-
diegetic narrator) of which are outside of the diegesis. The bottom five (diegetic narrator, character or nonfocalized
narration, external focalization, internal focalization (surface), and internal focalization (depth) operate inside the
diegesis. (See Branigan, 1992, Chapter 4, for definitions of these terms.)

The ideal relation a narrative fiction film attempts to establish with the spectator is one of complete absorption in the
film’s diegesis, and anything non-diegetic, such as a music score that is too intrusive and reminds viewers that they
are sitting in a cinema, or extra-diegetic, such as the poor amplification of a run-down movie theater’s speakers, is
considered a violation or at least a weakness (unless we are talking about experimental or avant-garde films with a
deliberate aim of keeping viewers from getting immersed in the film’s story), and a depiction of the historical
author, such as Hitchock or Rod Serling introducing their own TV shows, is simply public relations.

In a virtual reality fiction, such as Memesis or a computer game like Asheron’s Call, the extra-diegetic and non-
diegetic level of interaction for the user are a constant necessity, even as he is fully engaged in the game – because
in virtual narration “fully engaged” means that the user is required to experience the virtual fiction through multiple
levels of narration simultaneously at all times. This is a far cry from all but a handful of films with complex
narratives that engage the spectator at multiple levels of narration for brief moments, often as brief as one shot and
usually no longer than one sequence. The focalizer is a fundamental narrative agent who acts as a mediator between
the fictional events and the spectator. With the assistance of the focalizer, the spectator becomes contextually and
intrinsically oriented in fictional space. When a viewer watching a film is engaged by multiple levels of narration
simultaneously, these are usually at levels within the diegesis, with different levels of focalization (a man
remembers a dream which depicted an event that actually occurred).

In his paper “Orientation in Film space: A cognitive Semiotic Approach” Buckland also notes that levels of fiction
(or multiple diegeses within one narrative text) “exist on the boundary between external and internal focalization”
(Buckland, 2003, 99). Buckland uses the films Total Recall and eXistenZ as examples. Total Recall (Paul
Verhoeven, 1990) presents three alternative “levels of reality” for the main protagonist Doug Quaid to inhabit. At
key points in the film, both Doug Quaid and the spectator are confused about which level of reality is operative. Is
Quaid simply a lowly construction worker on Earth? Or is he strapped into a chair at Rekall being fed memories of a
trip to Mars, pretending to live the life of a secret agent? Or is he really a secret agent on Mars, working in collusion
with Cohaagen to kill the leader of the Mars resistance? The spectator is placed with or focalized around Quaid’s
experience of these different levels of fiction. Like Quaid, we are disoriented for we do not know which level of
fiction is “true,” and which ones are false. eXistenZ (David Cronenberg, 1999) also presents three levels of reality,
and creates ambiguity concerning what level the characters are living in. But by the end of the film, it is evident that
the film began on level two (in a computer game), progressed to level three (a computer game within a computer
game), and then ended on level one (the characters’ real life) – although a throwaway ending “challenges” this
reading. The confusion of levels is simply caused by the fact that each level imitates the others, and the film begins
on level two, not level one.

3.4 Center-Periphery Schema

Levels of focalization and the center-periphery schema are a natural match. Buckland applied the center-periphery
schema to a hierarchy of shot types (Buckland, 2000), which is too similar to the part-whole and linking schemas to
be really useful. However, focalization, especially in interactive fiction, works precisely in this way: we experience
our bodies as having centers (the trunk and internal organs) and peripheries (limbs, hands and feet, hair). We view
our centers as more important than our peripheries, so that someone who has lost a limb is still the same person. This
schema has three important elements: an entity, a center, and a periphery (Lakoff, 1985, 274). Focalized levels of
narration emphasize the character’s direct experience of events. The Exocentric perspective of VR is analogous to
what we mean by external focalization in film, and egocentric matches internal focalization (surface). However,
focalization in interactive narrative has an additional layer of complexity: first there is the user her- or himself, a real
body in an extra-diegetic space, but linked to the diegesis directly through the interface. For example, in the typical
CAVE™ experience, one person wears the glasses that dictate the perspective and orientation of the image of the
other people standing close by with glasses on that are not attached to the system. The “helmet” person also has a
joystick with which to control the travel through the spaces. No one person has exactly the same perspective as the
other. Then there is the avatar. This can be a visible avatar that the user relates to exocentrically (as in all those over-
the-shoulder games such as the Tombraider series where the user is always one step behind their avatar) or, as is
most common in CAVEs™ VRs, egocentrically.

The avatar has been programmed in for us – the default “height” is 6 feet and the default width of the head is two
feet in Memesis for example, which means that you cannot walk through an arch that is scaled to five feet, among
other things. In Memesis the avatar is usually invisible, represented on occasion by a hand that helps the user
accomplish tasks in the virtual space. (The invisibility is motivated by Memesis’ approach to emotional engagement,
of which more below.) Other VR environments, such as NICE, though always egocentric, allow the user to get a
glimpse of their avatar; in NICE this happens when the user looks at their reflection in a pond (Sherman, 2003, 466).
Many VR researchers are focused on the effect of varying degrees of agency and anthopomorphism on the user’s
sense of presence. For example, Kristine L. Nowak and Frank Biocca found in an experiment that that varied the
level of anthopomorphism in the image of the interactants to have surprising results. People responded the same to
the agents whether they thought the agent was controlled by a human or a computer, proving that if it acts human,
we relate to it as human. Users who test the less-anthopormophic avatar images (eyes and mouth floating in space)
reported a higher sense of copresence and social presence than those who interacted with avatars that had no image
at all and those who interacted with highly anthropomorphic images (the researchers concluded that this set up
higher expectations that led to reduced presence when these expectations were not met) (Nowak and Biocca, 2003).

And what about internal focalization (depth), the more complex experiences of thinking, remembering, interpreting,
wondering, fearing, believing, desiring, understanding, feeling guilt, that is so well depicted in film? This is where
software programming can really add something to the avatar. Leon Hunt gives the example of martial arts games,
such as the Tekken series, which enables the user to “know Kung Fu”. These games allow your avatar to incorporate
the martial arts moves of various martial artists, as well as the signature gestures of various film stars playing martial
artists (Hunt, 2002). As a result the internal depth focalization of this avatar – it knows Kung Fu, even if its user
does not – is given authenticity by extra-diegetic signs: the signature moves of well-known martial artists and the
gestures of movie stars. So an avatar’s skills, whether it be rogue, wizard or warrior, and any backstory they care to
share with their user, can all be described as internal focalization depth.

There are computer games and virtual reality environments where the user has no avatar at all. In tabletop VR, or
god point of view games (such as most strategy games with isometric design) the user has a lot of control over
events but no digital representation. This doesn’t mean that there is no narratee position for the user. In games like
Creatures or Black and White, for instance, users care for the little creatures or select which of the games denizens
will evolve and which will not. The range of possible choices and the specific choices made become the user’s
narratee position in the text, a position of focalization without direct representation.
Espen Aarseth identifies these three positions as intriguee (the target of the games intrigue, whom he also calls the
“victim”), narratee, for the textual space outlined for the player, and puppet (or avatar). To explain the difference
between these three functions, he gives the example of character death: “the main character [the avatar or puppet] is
simply dead, erased, and must begin again. The narratee, on the other hand, is explicitly told what happened, usually
in a sarcastic manner, and offered the chance to start anew. The user, aware of all this in a way denied to the
narratee, learns form the mistakes and previous experience and is able to play a different game” (Aarseth, 1997, 113,
quoting Aarseth, 1994, 73-74). In other words, the avatar is at a level of focalization, the narratee is at the level of
non-diegesis, and the intriguee or user is at the level of extra-diegesis.

3.5 Source-Path-Goal Schema

This is the last schema that Lakoff treats in depth. Buckland points out that narrative trajectories can be understood
in terms of the source-path-goal schema. This is true for interactive narratives as well, even though interactive
narratives have multiform plot structures, because there is usually one beginning (though some games have multiple
beginnings, so that the user doesn’t start in the same place each time they play), a multiplicity of stops along the way
that can be navigated in different ways, and one or more endings. Espen Aarseth describes this way of looking at
navigation as ergodic, in which the traverser of the interactive narrative effectuates a semiotic sequence, a process
that requires an extra-noematic effort as well as a mental one (Aarseth, 1997, 1).

4 Applying the Cognitive Model to Memesis


Memesis is designed for an egocentric, immersive, 3D VR environment; McMahan built the prototype for use in
CAVEs™, using the Ygdrasil software written by Dave Pape. The interactive narrative of Memesis is a horror story,
an extension of the cinematic horror genre. All interactants begin their Memesis experience at the same spot, but
from there can have a variety of experiences, each one designed to lead the user to experience any one of half a
dozen basic human fears: the fear of abandonment, of betrayal, of humiliation, and so on. Each of these fears is
paired with a phobia. The only avatar representation is a hand, to enable users to better act within the environment;
otherwise the avatar is imageless, as the design of the game is based on a collapsing of the characteristics of avatar,
narratee, and user. To further bring the user into the game, the linking mechanism is a biofeedback (galvanic skin
response) interface. The biofeedback data is collected as the user traverses the game, and determines what his final
experiences will be (another version of the type of phobic experience and the basic fear experience that elicited the
greatest anxiety responses). Upon completing the game (which is only meant to be played once by each user, along
the lines of an art installation) the user will have learned something about her own psychological makeup that she
might not have known before.

The first conscious choice in designing Memesis was to have an episodic narrative of loosely-interrelated events,
which will make sense even if the user only experiences some of them (one or two modules are sequentialized,
meaning the user has to experience one before he can experience the other). The experience has two basic levels: the
first level contains the experiences which are used for collecting the biofeedback data, and the second level is the
user’s “rewards” – the most fear-inducing, phobia-inducing episodes (usually two back to back) that the
environment has to offer someone with their particular biofeedback profile. (McMahan and Tortell conducted a
human subject study to establish the basis for experience assignment to biofeedback results; these settings will be
altered once the installation is up and running and there is more data (McMahan & Tortell, 2004)).

The most significant decision was to take advantage of the tri-partite nature of the user while playing, which were
labelled intriguee, narratee, and avatar (or puppet) above. Sequential narrative, which assumes a causal connection
between a sequence of events and is seen most frequently in films and literature, does not work very well in
interactive fiction. This means that the narratee position is weaker in interactive fiction than in sequential fiction.
The user is also limited in how much control she has over the avatar; she can dictate most of its moves, depending
on her skill level, but not too much of its basic programming (its internal focalization), except by choosing which
game to play. This leaves the intriguee, or player position, as a position of strength for the virtual environment.
Memesis is designed to elicit a strong emotional response based on these cognitive theories, and preliminary runs
have shown it to work rather well.
5 Conclusion

This paper is meant as a starting point for further research. By focusing first on applying cognitive-semiotic theories
from film to virtual reality, specifically orientation, we hope to have laid the groundwork for a deeper theorizing of
how emotion is conveyed and elicited in interactive narratives. Buckland plans to continue to expand his cognitive-
semiotic theory to include newer cognitive theories of emotion, and McMahan plans to continue to apply these
theories to Memesis in both its construction and practice, as well as to other forms of virtual reality.

5.1 References

Aarseth, E. J. (1994). Nonlinearity and Literary Theory. In Hyper/Text/Theory, Geoge P. Landow, ed. Baltomore:
John Hopkins University Press, 51-86.
Aarseth, E. J. (1997). Cybertext: Perspectives on Ergodic Literature. Baltimore and London: The John Hopkins
University Press.
Branigan, Edward (1992). Narrative Comprehension and Film. London and New York: Routledge.
Buckland, W. (2000). The Cognitive Semiotics of Film, Cambridge, U.K.: Cambridge University Press.
Buckland, W. (2001), Black Cats, Dark Rooms, and Paper Tigers: A Reply to Petric and Grodal, Film-Philosophy_,
5 (13). Retrieved February 20, 2005 from http://www.film-philosophy.com/vol5-2001/n13buckland
Buckland, W. (2003). Orientation in Film Space: A Cognitive Semiotic Approach. Recherches en
communication,19, 87-102
Bühler, K. (1990). Theory of language. Donoald Fraser Goodwin, transl. Amsterdam: John Benjamins. Chapter 8.
Currie, G., Image and mind: film, philosophy, and cognitive science, New York, Cambridge University Press, 1995,
p. xii., also page 166-67.
Duguet, A-M., Klotz, H., and Weibel, P. (1997). Jeffrey-Sahw – a user’s manual: From Expanded Cinema to
Virtual Reality. Cantz: Edition ZKM
Heath, S. (1981). Questions of Cinema. London: MacMillan. Chapter 3.
Hunt, L. (2002). ‘I know Kung Fu!’ The martial Arts in the Age of Digital Reproduction.
ScreenPlay:cinema/videogames/interfaces. Geoff King & Taynya Krzywinska, eds. London and New York:
Wallflower Press, 194-205.
Johnson, M. 1987. The Body in the Mind: the Bodily Basis of Meaning, Reason and Imagination. Chicago:
University of Chicago Press.
Lakoff, G. (1987). Women, Fire and Dangerous Things: What Categories Reveal About the Mind. Chicago:
University of Chicago Press.
McMahan, A. (2002, July). Sentient VR: The Memesis Project (Report of a Work in Progress). Proceedings of the
6th World Multiconference on Systemics, Cybernetics and Informatics, Nagib Callaos, Marin Bica and Maria
Sanchez eds., Orlando, FL.: International Institute of Informatics and Systemics (IIIS), 467-472.
McMahan, A. (2003, October). Memesis: A Prototype in Biofeedback and Virtual Reality Narration for CAVEs.
VSMM2003: Hybrid Reality: Art, Technology and the Human Factor, Proceedings of the Ninth International
Conference on Virtual Systems and Multimedia, Montreal, Hal Thwaites, ed. Montreal, Canada: Hexagram
Institute, 694-702.
McMahan, A. (2005). The Films of Tim Burton: Animating Live-action in Hollywood. New York: Continuum.
Forthcoming.
McMahan, A. & Tortell, R. (2004, March). Virtual Reality and the Internal Experience. Paper presented at the
Virtual Reality for Human Consumption, co-located with Haptics Conference. Chicago, IL.
Metz, C. (1974). Film Language: A Semiotics of the Cinema. New York: Oxford University Press.
Morris, D. and Hartas, L. (2003). Game Art: The Graphic Art of Computer Games. New York: Watson-Guptill. 140-
145.
Nowak, K. L. and Biocca, F. (2003). The Effect of Agency and Anthropomorphism on Users’ Sense of
Telepresence, Copresence, and Social Presence in Virtual Environments. Presence, 12:5, 481-494.
Sherman, W. R. & Craig, A. B. (2003). Understanding Virtual Reality: Interface, Application and Design.
Amsterdam et al: Morgan Kaufman Publishers.
Souriau, E. (1951). La structure de l’univers filmique et le vocabulaire de la filmologie. Revue Internationale de
Filmologie, 7-8.

S-ar putea să vă placă și