Sunteți pe pagina 1din 13

Entertainment Computing 5 (2014) 271283

Contents lists available at ScienceDirect

Entertainment Computing
journal homepage: ees.elsevier.com/entcom

An interaction approach to computer animation q


Benjamin Walther-Franks , Rainer Malaka 1
Research Group Digital Media, Universitt Bremen, Fb3, Bibliothekstr. 1, 28359 Bremen, Germany

a r t i c l e

i n f o

Article history:
Received 31 March 2014
Revised 15 July 2014
Accepted 19 August 2014
Available online 2 September 2014
Keywords:
Motion design interfaces
Performance animation
Humancomputer interaction
Design space

a b s t r a c t
Design of and research on animation interfaces rarely uses methods and theory of humancomputerinteraction (HCI). Graphical motion design interfaces are based on dated interaction paradigms, and novel
procedures for capturing, processing and mapping motion are preoccupied with aspects of modeling and
computation. Yet research in HCI has come far in understanding human cognition and motor skills and
how to apply this understanding to interaction design. We propose an HCI perspective on computer animation that relates the state-of-the-art in motion design interfaces to the concepts and terminology of
this eld. The main contribution is a design space of animation interfaces. This conceptual framework
aids relating strengths and weaknesses of established animation methods and techniques. We demonstrate how this interaction-centric approach can be put into practice in the development of a multi-touch
animation system.
2014 Elsevier B.V. All rights reserved.

1. Introduction
Moving images are omnipresent in cinema, television, computer games and online entertainment. Digital media such as text,
images and lm are nowadays produced by a diverse crowd of
authors, ranging from beginners and laymen to professionals. Yet
animation is still seen by most people as a highly sophisticated
process that only experts can master, using complex interfaces
and expensive equipment. However, consumer motion capture
technology has recently enabled and created a mass-market for
easy-to-use animation tools: computer games. In contrast to most
professional animation tools, recent games employ full-body interaction for instance via Kinect, allowing users to control a virtual
character instantaneously through their body. This trend is feeding
back into the area of the experts, with researchers investigating
time-efcient interfaces for computer puppetry using the Kinect
(e.g. [61,55]. Computer animation is currently seeing an inux of
ideas coming from the world of easy-to-use game interface made
for players with no prior training. Game designers in turn are
informed by design knowledge and methods developed over
decades of research in humancomputer interaction (HCI).
It is thus time that computer animation be approached from an
HCI perspective. This could aid describing and analyzing the vast

spectrum of animation techniques ranging from very intuitive puppetry interfaces for computer games to highly sophisticated control in advanced animation tools. Our goal is to understand
principles that underlie humanmachine interactions in computer
animation. With new ways of thinking about interactions with
continuous visual media and a thorough investigation of new animation interfaces on a theoretical foundation, motion design interfaces can be made more beginner and expert friendly.
This can be achieved by embedding computer animation methods and interfaces in an HCI context. Trends in motion design
interfaces can be connected with discussions on next generation
interfaces in HCI. Theoretical frameworks can aid us in tackling
the concrete user interface issues by a profound analysis, which
can aid the process of designing new mechanisms for more natural
and intuitive means of motion creation and editing.
This article approaches this goal in three main steps. We will
rst review related work from computer graphics, human computer
interaction and entertainment computing from a user- and interface-centric perspective with a focus on methods, mappings and
metaphors. In the second step we construct a design space for interfaces that deal with spatiotemporal media. In the third step, the
utility of this conceptual framework is illustrated by applying it in
designing a multi-touch interactive animation system.
2. Animation techniques: an interaction view

This paper has been recommended for acceptance by Andrea Sanna.


Corresponding author. Tel.: +49 421 218 64414.
E-mail addresses: bwf@tzi.de (B. Walther-Franks), malaka@tzi.de (R. Malaka).
1
Tel.:+49 421 218 64402.

http://dx.doi.org/10.1016/j.entcom.2014.08.007
1875-9521/ 2014 Elsevier B.V. All rights reserved.

Computer-based frame animation is the direct successor of traditional hand-drawn animation, and still the main method.
Advances in sensing hardware and processing power have brought

272

B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283

entirely new possibilities. Motion capture records the live performance of actors, introducing a new form of animation more akin
to puppetry than traditional animation. Programmed animation
enables realistic simulations to provide interesting secondary
motion and create more believable worlds.
Traditionally, in computer-based keyframe animation, only
extreme poses or key frames need to be manually established by
the animator. Each keyframe is edited using manipulation tools,
which can be specialized for the target domain, e.g. character
poses. Some manipulation tools allow inuencing dynamics
directly in the scene view. The most common means of specifying
dynamics is by using global descriptions, such as time plots or
motion paths. Spatial editing between keyframes can be achieved
indirectly by editing interpolation functions or by dening a new
key pose.
Motion timing is usually done via global descriptions of dynamics. However, some temporal control techniques directly operate
on the target. Snibbe [58] suggests timing techniques that do not
require time plots but can be administered by directly manipulating the target or its motion path in the scene view. As with spatial
editing, the practicality of temporal editing with displacement
functions depends heavily on the underlying keyframe distribution. Timing by direct manipulation in the scene view is also supported by the latest animation software packages. Tweaking
motion trail handles allows for temporal instead of spatial translation; visual feedback can be given by changing frame numbers
adjacent to the handle. Spatial control of time has also been proposed for video navigation [15].
Motion graphs are two-dimensional plots that map transformation values (vertical axis) against time (horizontal axis). With a
2DOF input device, such a graph thus allows integrated, simultaneous spatiotemporal control. In keyframe animation the motion
editor is the standard way to manage keyframe value interpolation,
typically by means of Bezier curve handles.
In contrast to keyframe animation, performance animation uses
motion capturing of live performance of an actor or puppeteer by
tracking a number of key points in space over time and combining
them to obtain a representation of the performance. The recorded
data then drives the motion of a digital character. The entire procedure of applying motion capture data to drive an animation is
referred to as performance animation [44]. In a typical setup, an
actors motion is rst recorded, then the data is cleaned, processed
and applied to a digital character. Since the digital character can
have quite different proportions than the performer, retargeting
the motion data is a non-trivial task [24]. In this form of performance animation, capture and application of motion data to an
animation are two separate processes, data handling is done offline. Online performance animation immediately applies captured
data to a digital character, creating animation instantly, allowing
the performer to react immediately to the results or to interact
with an audience [59,24]. Processing limitations sometimes entail
that performers can often only see a low-delity pre-visualization
of the nal rendering [44].
Many performance animation efforts aim to represent human
motion accurately and limit the abstraction to a minimum and
the motion capture performers use only the senses with which
they have learned to act (e.g. kinaesthetic and proprioceptive feedback). For performance animation of stylized or non-humanoid
characters it is desirable to control them in a less literal fashion.
Such a style of performance control is often referred to as computer
or digital puppetry [3,59]. Just as traditional puppeteers would rely
on mirrors or camera feeds to adjust their performance, computer
puppetry requires instant renderings of the applied input to allow
performers to adjust their motions. Real-time mappings either use
high bandwidth devices for coordinated control of all character
DOF, or employ models based on example data or a physical

simulation. One challenge is to control a high number of degrees


of freedom (DOF) at the same time.
Real-time control of humanoid characters suggest literal mappings from the puppeteers physique to the characters skeleton.
Non-humanoid characters such as animals, monsters or animate
objects are difcult since they have a vastly different morphology
and motion style to humans. Seol et al. [55] address this by learning mappings through users mimicking creature motion during a
design phase. These learnt mappings can then be used and combined during online puppetry. In similar work, Yamane et al. [66]
propose matching human motion data to non-humanoid characters with a statistical model created on the basis of a small set
manually selected and created human-character pose pairs; however, this process is conducted ofine. The technique for optimal
mapping of a human input skeleton onto an arbitrary character
skeleton proposed by Sanna et al. [67] manages without any manual examples and nds the best match between the two based
solely on structural similarities.
For animation techniques on desktop input devices, however,
typically less DOF are available. Recently this has been addressed
by multi-touch input devices, which enable techniques for simultaneous rotation, scaling and translation (RST) for 4DOF control of a
2D target [26]. Reisman et al. [52] developed a technique for integrated rotation and translation of 3D content using an arbitrary
amount of contact points on an interactive surface.
When input devices of lesser DOF than the object parameters
are used, integrated control is not possible. This is a common problem in desktop interaction for navigating and editing 3D media,
since most desktop input and display devices only have two DOF.
Interface designers thus often face the problem of mapping two
control DOF to a higher-dimensional target parameter space. A
solution is to separate the degrees of control, i.e. splitting object
DOF into manageable subsets [4]. With single-pointer input
devices, this necessitates a sequential control of such subsets, e.g.
through displays of multiple orthographic projections of the scene
in one split screen or through spatial handles that are overlaid on
top of the target object. [4].
If high-DOF devices are not available and temporal multiplexing
is not desired, interface designers can choose to constrain the
interaction to reduce required control DOF. A challenge for designers is that the model behind the constraint must be understood by
the user, for instance by basing them on mechanisms already
known from other contexts.
Yamane and Nakamura [64] present a pin-and-drag interface
for posing articulated gures. By pinning down parts of the gure,
such as the end-effectors (feet or hands) and dragging others, the
whole character can be controlled with relative ease. Joint motion
ranges, the current joint conguration and the user-set joint constraints (pins) thus allow constrained control of several character
DOF with as few as two position input DOF for a 2D character.
The various constraints are prioritized so that dragging constraints
are always fullled and solved by differential kinematics that give
a linear relationship between the constraints and the joint
velocities.
Several research projects have attempted to leave the world of
explicit mappings and enable low-to-high-dimensional control,
bimanual interaction and multi-user interaction implicitly by simulating real-world physics. Frohlich et al. [20] let users kinematically control intermediate objects that are attached to target
objects by springs. The spring attachment is also used by Agrawala
and Balakrishnan [1] to enable interaction with a physically simulated virtual desktop, the Bumptop.
Limitations in the motion capture system or the performers
physiology to produce certain desired motions can be overcome
by simulating parts of the body and their interaction with the environment. Ishigaki et al. [31] combine real-time full-body motion

B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283

capture data, physical simulation and a set of motion examples to


create character movement that a user cannot easily perform, such
as climbing or swimming. The virtual environment contains predened interaction points such as the handles of a monkey bar or a
rope. Once the characters end-effectors are brought into proximity
of an interaction point, control changes so that the character
motion is no longer fully controlled by the motion capture. A simplied simulation that treats the intentional contact as a universal
joint connected to the characters centre of mass by a linear dampened spring enables the calculation of the overall dynamics of the
character.
Even when input and output degrees of freedom match, physical interdependencies of input DOF can still limit a mapping. In full
body tracking, the joint locations are dependent on actor size and
body proportions. If the performers proportions signicantly differ
from character proportions, this can lead to problems with the
character interacting with objects in the scene, such as the oor
or props. For this problem of retargeting of motion capture data
to a new character, Shin et al. [57] propose an approach that maps
input based on a few simple heuristics, e.g., considering the distance of end-effectors to an object in the scene.
For live performances, control needs to be addressed with highbandwidth input devices or performers acting in parallel. With
recorded performances the puppeteer has more options. Capture
sequences or just parts of them can be retaken, or slightly modied, and complex motion can be built up in passes. Layered or
multi-track animation allows the performer to concentrate on only
a small amount of action at a time and create articulated motions
step by step. Oore et al. [50] employ layered motion recording for
controlling subsets of a characters DOF. For the animation of a
humanoid, they divide the character DOF into three parts and animate these sequentially: Two 6DOF devices are used to control the
motion of both legs, both arms, and torso and head in three passes.
Dontcheva et al. [13] make motion layering to the principle of their
live animation system.
Video games have a strong connection to animation. Most modern video games make heavy use of animation in order to breathe
life into the game world. In this sense, games are one application
area amongst many others, such as lm, television, or education.
But animation is also created with and in video games. The actions
taken by players, the responses of game elements constitute a form
of motion design, often conveying a certain story. This is most evident in game genres where players control characters in a virtual
world, like a puppeteer controls a puppet. However, animating
for video games differs signicantly to animating for lm or television. While in lm characters and objects are only viewed from a
specic camera angle, in interactive media such as video games,
the behavior and the view are spontaneously dened by the player.
The animator cannot foresee the decisions of the player, which is
why he must create animations for all possible player actions that
must meet certain criteria of completeness and realism. Such
motion libraries contain elementary animation sequences can then
be looped, blended and combined in real-time by the game engine
[37]. By interactively directing pre-dened animations, players
thus essentially perform a kind of digital puppetry with indirect
control.
Motion control through high-DOF input devices extends the
degree of control, further blurring the lines between gaming and
puppetry: as players are able to inuence more character DOF,
their possibilities for expression are increased. However, while all
games use some form of motion capture, few offer motion editing
required in animation practice: if a player is not satised with his
performance, he will have to do it again. Most games lack techniques for even the basic task of time control, with notable exceptions such as Prince of Persia: Sands of Time [60], Zeit2 [5] and
Braid [49], in which the player must navigate time as well as space.

273

Yet while these games incorporate time control in innovative ways,


they do not provide the degree of control and editing required for
professional animation.
In Machinima, the art of 3D game-based lmmaking, animation
and video games ultimately come together to form a novel means
of creating animated movies [37]. Using game engines for animation or virtual lming has benets as well as limitations. Modern
3D games provide a complete game world with physics, animated
models, and special effects while offering comparatively simple
controls for puppeteering game characters. This gives authors a
lot to build upon, as opposed to other methods where animations
must be created from scratch. The limitations lie in the dependency on the game developer with their short product cycles, their
game engine and assets, and the legal issues involved in using
these. Computer puppetry in games remains limited, as is any performance control interface that merely activates and blends predened animations.
Viewing the state-of-the-art in animation with a coherent focus
on the user, mappings and control DOF is a rst step in analyzing
the current generation and developing for the next generation of
interfaces. The next step is to further structure this treatment: a
theoretical framework identies explicit aspects of interaction in
computer animation tools.

3. A design space for computer animation interfaces


Even though there is an increasing trend in computer graphics
research to consider the needs of the artist (e.g. [51,54], most work
on animation interfaces does not consider aspects of HCI. An interaction perspective on computer animation can help to construct a
design space of user interfaces for spatiotemporal media. Such a
design space can structure the designers options and aid researchers in analyzing the state of the art.
Existing interface design frameworks cannot be readily used for
animation interfaces, as they are either too general or too specic.
General frameworks [21,48] span too large a space or only analyze
certain aspects of interaction like input devices but not their mapping to output [9], while domain-specic frameworks [8,14] are
too focused.
Jacob et al. [33] present a framework for reality-based interaction (RBI) that includes four themes: Naive physics (NP) reects
the innate human understanding of basic concepts of physics, such
as gravity, forces and friction; body awareness and skills (BAS)
describes our sense of our own body and what we can do with
it; environment awareness and skills (EAS) covers how humans perceive and mentally model their environment and place themselves
in relation to it; social awareness and skills (SAS) stands for humans
as social animals, who generate meaning by relating to other
human beings. Considering the four RBI themes for computer animation, many techniques aim to tap the artists innate understanding of spacetime processes, relating to the theme of naive physics
(NP). The environment awareness and skills theme (EAS) comes
into play as soon as humans interact with these real world
spacetime processes. For instance, multi-nger deformation techniques for 2D puppetry on interactive surfaces [45] rely on our natural sense of timing and real-world experience with objects (NP,
EAS). In fact any technique based on motion capture for dening
dynamics relies on users intuitive sense of space and time (NP,
EAS). Performance controls for digital puppetry use the performers
understanding of their body (BAS). As Kipp and Nguyen [39] illustrate, a puppeteer uses complex coordinated hand movements to
bring a wooden puppet to life. Even the technique for low-delity
input via mouse and keyboard of Laszlo et al. [40] exploits both an
animators motor learning skills and their ability to reason about
motion planning (BAS). Collaboration in computer animation is

274

B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283

common, as large productions require teams to work together, but


does not usually involve close coordination during a single task.
Multi-user puppetry interfaces are different in that they tap the
ability of humans to relate to other human beings (SAS). The four
qualities must be traded off against other desirable qualities, such
as expressive power, efciency, versatility, ergonomics, accessibility and practicality, on a case-to-case basis.
While these themes are relevant for designing any kind of novel
interactive systems aiming at reality-based interaction, they are
rather general. For a conceptual framework specic to animation
it is thus necessary to dene a new design space. In the following
we discuss the aspects we have identied in our work as relevant
to such a framework. We motivate their inclusion and relate them
to each other. We will also relate our framework to the RBI
framework.

3.1. Aspects of design


Analogous to general models of humancomputer interaction,
computer animation involves a dialog between a human artist
(animator, actor or puppeteer) and the application, a virtual artifact
(the animation). This occurs through a hardwaresoftware machine
(the animation software and the hardware running it, including
input and output periphery). A design framework should consider
aspects of these entities and their relations. Fig. 1 shows this basic
triangular structure that describes two views of this human-artifact dialog, one that takes the machine as a mediator into account
(left and lower edge: artist-machine-artifact) and one that
abstracts from it (right edge: artist-artifact). Seven aspects characterize these entities and their relations: task, integration, correspondence, metaphor, directness, orchestration and spacetime.
In the following we will discuss these seven design aspects and
their relevance for HCI and animation tools.
Animation tools for productive use are designed around the
task for which they are intended. Decomposition breaks down tasks
into further subtasks, which can be, in turn, repeatedly broken
down until one arrives at basic tasks at the desired level of decomposition which is frequently used to structure interaction techniques [17,4,29]. At the top level, the main tasks in animation
design are motion creation (generating from scratch), motion editing (adapting an existing design) and viewing (for visual feedback
on spatial and temporal design). At a lower level, task decomposition structure varies highly with the type of animation artifact, i.e.
character animation or environment effects. Tool generality [53] or

Fig. 1. The design space of animation interfaces characterizes the entities involved
in the interaction and their relations.

versatility [33] characterizes the variety of interaction tasks that


can be performed with an interface. This can range from supporting a large amount of tasks from varied application domains to
only supporting a single, domain-specic task. Tasks are the goal
of interaction and aim at creating the animation. Therefore, our
design space links the aspects of tasks to the virtual artifact (Fig. 1).
An input device denes the integration of control how many
DOF can be changed at the same time from the same input source
[2]. Performance controls are traditionally very specialized, e.g.
using full-body motion capture suits or special hand-puppet input
devices [59,34]. Yet research has also brought forward more general controls, such as the 2D multi-point deformation technique
of Igarashi et al. [30]. Since computer animation often involves
domain objects with large amounts of degrees of freedom (even
a simple 3D articulated biped will have around 30 DOF), specialized high-DOF input devices allow for a high level of integration.
Ideally the input device should match the structure of the task
Jacob et al. [32]. In most situations the DOF of the input device
are not sufcient and solutions like articial separation or constraining mappings based on a certain model have to be found. If
other considerations lead to using lower-DOF input devices, tasks
should be adapted accordingly, e.g. by separating translation and
orientation [43]. The aspect of integration is mostly construed from
the set-up of the input device. We thus locate the aspect of integration next to the machine in the design space (Fig. 1).
Correspondence describes how the morphology of the physical
input through the input device and the resulting response of the
artifact relate [29]. Bodenheimer et al. [3] distinguish performance
animation controls by the degree of abstraction in the sense of correspondence. At the one end of the spectrum, mappings are primarily concerned with the character or style of the motion rather
than literal mappings between performer and target. Such mappings are more commonly used in computer puppetry. At the other
end of the spectrum are efforts to accurately represent motion that
strive to limit the degree of abstraction to a minimum. A high spatial correspondence between input and output requires less mental
effort since it draws on our experience in using our own body and
encountering real-world objects (BAS, EAS). UI designers must face
the tradeoffs between better learnability through high correspondence and the range of motions that can be expressed. The aspect
of correspondence bridges the virtual artifact and the machine
characteristics (machine-artifact edge in Fig. 1).
The metaphor is a notion for describing the mapping of cognitive intentions to physical device interaction using concepts
known from other domains [47,4]. In the conversation metaphor
the user engages in a written or spoken dialogue with the machine.
They are well suited for high-level operations, but less suited for
spatial precision and expression. Today graphical user interfaces
represent the dominating manipulation metaphor, where the user
acts upon a virtual world rather than using language as an intermediary. Manipulation interfaces tap our naive understanding of the
laws of physics (NP), our motor memories (BAS) and how we perceive and interact with our surroundings (EAS). Manipulation
using instruments requires more learning and mental resources,
as well as introducing indirection [65,22]. Sensors tracking the
users body promote an embodiment metaphor where the user
identies with parts of a virtual world in a more literal way. For
avatar control, embodied interaction builds on our proprioceptive
and kinaesthetic senses (BAS), and can aid our feeling of presence
in virtual environments (EAS). Embodiment has been picked up in
current trends in computer animation that criticizes the complex
and abstract nature of motion design tools based on the WIMP paradigm. Since the aspect of metaphors is central to the artists cognitive understanding of his or her activity our design space links it
to the artist in Fig. 1.

B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283

Directness characterizes the physical distance between user


and the target. This includes both the spatial and the temporal
offset from input to output [2]. In our understanding of directness
we consider the relation between user (artist) and the physical representation of the animation through the machine (as illustrated
on the triangular design space in Fig. 1). Cognitive facets of directness have also been considered in other denitions [22,65], but
these can be covered in interaction metaphors.
Since computer animation interfaces deal with continuous or
time-based media with multiple spatial and one temporal dimension, interfaces need to support viewing and modeling not only of
static spaces but of their dynamics as well. As humans inhabit a
spacetime continuum, and all our actions always have a temporal
dimension, any kind of interaction between a human and a computer to create, edit or view dynamic content relates the humans
spacetime to the mediums spacetime. User time is generally
referred to as real time, which is continuous, the data time as
virtual or stream time, which is discrete [42,12]. Depending on
animation method and technique, the real time of user input can
affect the virtual time or not. Or only either spatial or temporal
parameters of the animation are changed. This suggests that
there are different ways in which real spacetime can be mapped
to virtual spacetime. So far the literature lacks a structured
approach to characterizing the relations of user and artifact space
and time. We will therefore propose a taxonomy in the next section, that sorts interaction techniques based on which components
of real and virtual spacetime are involved. This spacetime aspect
abstracts the relation of user and application from the device level,
which is why it is located on the artist-artifact edge of our design
space diagram (Fig. 1).
As a central element of our design space, Orchestration
describes in which order which parts of the users body perform
which sub-task through which input device. Since humans are
most adept at crafting with their hands, and for long time
humancomputer interfaces were optimized for manual control,
orchestration has been best studied for hand-based interaction.
Findings from behavioral psychology show that the dominant
and non-dominant hands are optimized for distinct roles in most
tasks. For instance, in the task of writing the non-dominant hand
rst establishes a reference frame relative to which the dominant
hand then operates. Using this knowledge in devising bimanual
interaction techniques can have benets for efciency [6], Hinckley
et al. [68], Balakrishnan and Kurtenbach [69]) and cognition, by
changing how users think about a task [35,41]. Many every-day
activities also show complex orchestrations of more than just the
hands, such as driving a car where feet control speed, hands the
steering, and ngers additional controls such as lights. Since
orchestration considers human, application and the mediating
device to an equal degree, it is situated at the center of the triangle
relation diagram representing the design space (Fig. 2).

275

Fig. 2. The taxonomy of spacetime mappings is structured based on how user


input in real spacetime controls medium output in virtual spacetime. Fig. 3 gives
examples of these categories.

we need only consider the two dimensions space, time on user


and medium side. The next step is to consider how these two
abstract input dimensions (control) affect the output dimensions
(presentation). The central idea underlying the construction of categories is that one or both control dimensions can affect one or
both presentation dimensions.
Four basic spacetime categories of mappings can be constructed from the possible combinations of the two sets (control
space, control time) and (presentation space, presentation time):





space ? space
space ? time
time ? space
time ? time

Often presentation space and time will be modied in an integrated fashion, or spatial and temporal control will both gure into
the inputoutput relation. For this we introduce two control-integrated spacetime categories that cover inputoutput mappings in
which both control dimensions contribute to the relation
 spacetime ? space (i.e., space ? space and time ? space)
 spacetime ? time (i.e., space ? time and time ? time)
and two presentation-integrated spacetime categories in which
both presentation dimensions are affected by the interaction:
 space ? spacetime (space ? space and space ? time)
 time ? spacetime (time ? space, time ? time)
The nal cases are the fully integrated spacetime categories
 spacetime ? spacetime (space ? space and time ? time)
 spacetime ? timespace (space ? time and time ? space)

3.2. Spacetime: a new design aspect


The concept of spacetime control mappings considers any navigation, creation or editing operation on a continuous visual medium as a mapping from real spacetime of the input device (the
control dimensions) to virtual spacetime of the presentation medium (the presentation dimensions). The output mediums presentation dimensions can be viewed and edited integrally or
separately regarding space and time. For instance, while framebased animation edits poses and the time instants at which they
occur separately, performance-based or procedural approaches
usually dene motion in an integrated fashion. Both real space
and time can control either or both virtual space and time. A rst
step in structuring these relations is to collapse the individual
spatial dimensions to a single abstract space dimension, so that

which reect that integrated control dimensions affecting presentation domains in an integrated way can be matched in two ways.
These ten spacetime categories cover all variants of mapping user
spacetime to medium spacetime. A simple means of visualizing
this is a 3  3 matrix, where the central cell is compartmented into
two, since relating both control and presentation space and time is
ambiguous (Fig. 2).
The rst row of the matrix describes control mappings that only
look at the spatial component of the input and do not consider the
timing of the users input. The third row describes control
mappings where input has no spatial component, and the user only
administers state changes with temporal triggers via controls
such as buttons. The second row describes control mappings where
spatial input stands in a temporal context. There are borderline

276

B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283

cases between temporal and spatiotemporal control: If trigger


controls exert spatial changes (such as move a step in a certain
direction), we speak of spatial control.
While some mappings can be easily sorted into these categories,
for others it may appear less clear. In the following we consider
each category individually and show that it is possible to nd
examples of actual interfaces for all of them (see also Fig. 3).
Controls in the space ? space category use the spatial component of user actions to affect the spatial dimensions of the medium.
Most kinds of interactive editing techniques in computer-aided
design fall into this category. A straightforward one-to-one
mapping of viewer time to medium time (time ? time) is video
playback. Examples of space ? time mappings are timelines that
employ a linear spatial representation of time for navigating or
altering time-dependent media. Software packages for frame-based
animation make heavy use of linear time plots for temporal navigation and timing transformations. Less common are examples for the
time ? space category. Passive navigation techniques for virtual
environments make use of such mappings [4]. After choosing a
target or route either automatically or with the user in the loop,
the system navigates the user along the route or to the target, mapping user time to medium space. Editing operations are rare in this
category, since the single input DOF is insufcient for most editing
tasks.
In mapping input spacetime for manipulating space only, the
redundant DOFs can be used either for enhanced robustness or
for controlling further parameters. For editing a static image, the
temporal component of the user input can, for instance, be used
to control the stroke type of the virtual brush (spacetime ?
space). Velocity-based spatial navigation techniques include input
space and time in the traversal of virtual space. The presentation
time can also be steered: interactive continuous adjustment of
playback speed (e.g. via a slider or wheel) changes video or animation playback during playback spatiotemporal input affects the
viewing of medium time (spacetime ? time).
Space

Space

Space

The category space ? spacetime can be found in time plots


that are a common means of graphically representing a variable
changing over time. Animation packages usually feature a graph
editor that enables integrated shifting of key positions and the values they represent in time and one (spatial) dimension. Threedimensional representations of a video stream, video streamers,
even allow spacetime video editing [56]. The mapping category
time ? spacetime is realized in automated navigation through
a dynamic medium: scripted camera movement through animated
scenes navigates both the time and the space of the target medium.
It is often used for cut-scenes in video games, so-called cinematics,
when interactive control is taken from the player for a short time
in favor of progressing the narrative with pre-dened camera
movement. This is different from video playback, where the spatial
component of the medium (the video frame) is not navigated during playback. While the result is essentially the same, this distinction is down to the fundamental difference in the medium data:
For video, the projection from 3D to 2D is already integrated into
the visual data (the video frames), while in 3D the projection is
determined at run-time.
The spacetime ? spacetime mappings can be found in many
examples of user interfaces for virtual worlds. Spatial actions
browse or alter the mediums space, and user and medium time
are linearly related. Such mappings are common for interfaces that
require high user immersion. Most performance controls for integrated motion creation also fall into this category, e.g. in interactive
video games or in performance animation. The remaining inverse
mapping of users spacetime to virtual timespace do not seem
to be used for practical implementations. They could, however, be
related to temporal triggers of a user (such as releasing some event)
that inuences some graphical representation where the users spatial input controls temporal parameters of the event.
The spacetime view of operations on continuous visual media
give a new perspective on the types of such operations: whether
they are invasive (editing) or non-invasive (viewing) and whether

Space-Time

Space

Time

Applications Spatial Manipulation


Techniques Manipulators/Gizmos/Handles

Motion Editing
Graph Editor

Time Control
Timeline Bar

Scenario

Adjusting ease-in/ease-out

Browsing a video

Space
Space-Time
Applications Interactive Travel in Static Virtual Environments
Techniques Steering
Browsing a 3D information space
Scenario

Space-Time
Space-Time
Performance Animation, Video Games

Space-Time
Time Control

Computer Puppetry

Jog Shuttle

Animating a character

Browsing a video

Space
Time
Applications Passive Travel in Static Virtual Environments
Techniques Target-based Navigation/Fly-Throughs
Exploring architectural models
Scenario

Space-Time
Time
Passive Travel in Dynamic Virtual Environments
Target-based Navigation/Fly-Throughs
Watching a cut-scene in a 3D video game

Time
Time
Playback
Triggers/Buttons
Watching a video

Posing a character

Time

Fig. 3. Nine categories of spacetime mappings with example applications, techniques and scenarios of use. (Figure contains cropped stills of third party material licensed
under CC BY 3.0. Top left, top right and bottom left images attributed to the Durian Blender Open Movie Project; bottom left image attributed to Frontop Technology Co., Ltd;
bottom center image attributed to Apricot Blender Open Game Project).

B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283

they involve creating new designs from scratch or rening existing


designs. Firstly, collapsing all spatial parameters into one abstract
space dimension hides the fact that, as a rule, both control and
medium space involve multiple spatial parameters, while time
only constitutes a single quantity on each side. This has an impact
on the distribution of invasive versus non-invasive operations in
the matrix: techniques employing time as input (third row) are
mainly used for passive navigation, rather than for spatial manipulation. This is because space offers more input dimensions and we
can navigate space easier than time. This asymmetry has shaped
how we mentally model the abstract dimension of time: we rather
think of time in terms of space than vice versa [10]. Secondly, the
columns sort mappings into renement through spatial editing and
temporal editing (left and right column), and creation through integrated inuence on medium spacetime (center column). Thirdly,
in many cases the distinction between non-invasive and invasive
operations is a theoretical one. A y-through of a 3D scene can
either be seen as a navigation that does not change the dataset
or as a camera animation that does. The criteria for distinction
should come from the application: is the camera animation being
created a part of the medium or is it an ephemeral product of
the viewing operation? This distinction has an effect on categorization, too.
3.3. Limitations
The aspects characterizing the design space of animation interfaces constitute a high-level framework. As such they provide a
structure and cues for design reasoning and analysis, rather than
concrete guidelines. In the following we will illustrate its utility
by showing how we used the design space in developing novel animation techniques. More case studies and examples are required to
illustrate its application in the multitude of animation-related
issues.
The design space does not offer a set of orthogonal dimensions,
rather its aspects are interrelated. For example, the nature of the
task is linked to the type of spacetime mapping: automation
can take control away from the user up to the point that spatiotemporal input (e.g. continuous control of a puppets legs) can be
reduced to temporal input (e.g. triggering puppet walk cycles with
a button). Another example of such dependencies is that the choice
of metaphor determines the magnitude of directness: from indirect
manipulation over direct manipulation to embodiment. The interrelation between the seven design aspects may be not surprising,
as each can be seen as a perspective on the same issuedesigning
user interfaces for controlling spatiotemporal phenomena.
The design space presented in this section is a conceptual
framework for analyzing and designing animation interfaces. It
uses established design aspects identied in the HCI literature.
For describing relations of input and animation spacetime, which
are central to this class of interface, we could not rely on any prior
work. For this aspect we developed a taxonomy for sorting mappings into categories based on how they relate input and output
spacetime. Next we will show how we have used these design
aids in practice, both evaluating them as design tools and using
them to propose novel animation interfaces.
4. A multi-touch animation system
In order to illustrate the utility of the design space as an aid for
designing animation interfaces, we explain how it was employed in
the development of a novel animation system that we have presented in prior work (Walther-Franks et al. [70]). We go beyond
the original work by explicating the design approach underlying
it. The design space-driven approach was chosen in lieu of the rst

277

iterations of a human-centered design process. In our experience


with proposing novel interaction paradigms these stages of an iterative design approach have the issue that users are unfamiliar with
the possibilities of novel technologies and are strongly biased by
existing solutions. The design space can help to guide the rst
phase of design until users can be provided with artifacts to
experience.
Even though free-space 3D input devices have recently become
highly popular in particular in combination with game consoles,
they still lack the possibility for accurate and precise control
needed for serious animation editing. Systems like the Kinect are
good for high-level avatar control, with predened animations.
For more accurate editing, these systems are not yet feasible.
Direct-touch interactive surfaces provide better precision for animation tasks, and have the best makings for high directness and
correspondence of interaction. The potential of interactive surfaces
has been explored for various applications but only a few consider
animation [45,39]. Most surface-based 3D manipulation techniques are not developed and evaluated for motion capture. Furthermore, most projects only look at individual techniques and
lack a system perspective. However, this is necessary to shed light
on real-world problems such as integrating tools into whole workows or dealing with the realities of software engineering.
4.1. Design approach
Going through the design aspects of our framework, we consider options and make decisions, building up a design approach
to follow for the implementation.
4.1.1. Task
As a typical animation task we decided for performance animation of 3D rigid body models. Working with three-dimensional
content poses the challenge of a discrepancy between input space
(2D) and output space (3D). In recent years researchers have
started investigating 3D manipulation on interactive surfaces, from
shallow depth manipulation [27] to full 6DOF control [28,52]. The
problem for surface-based motion capture is to design spatial mappings that allow expressive, direct performance control by taking
into account the unique characteristics of multi-touch displays.
Many performance control interfaces are designed to optimally
suit a specic task, such as walk animation or head motion. This
means that for each type of task the performer must learn a new
control mapping. This is somewhat supported by specialized
devices that afford a certain type of control. For 2DOF input devices
like the mouse this is transferred to digital affordances like handles
of a rig. These map more complex changes in character parameters
to the translation of manipulators. The specialization is designed
into the rig, equalizing control operations to general translation
tasks. Since interactive surfaces have a 2DOF integrated input
structure, we copy this approach for our system.
An important secondary task is dening the view on the scene.
Since direct-touch performance controls are dened by the current
projection, this puts a high demand on view controls regarding
exibility, efciency and precision. With few exemptions [16,23],
research on surface-based 3D interaction has not dealt much with
view control. Yet 3D navigation is essential for editing complex
scenes in order to acquire multiple perspectives on the target or
zoom in on details. Some surface-based virtual reality setups use
implicit scene navigation by tracking user head position and orientation. However, this limits the range of control. For unconstrained
access to all camera degrees of freedom a manual approach offers
the highest degree of control. A common solution is to introduce
different modes for object transformation and view transformation
(camera panning, zooming, rotation/orbiting). This is prevalent in
desktop 3D interaction, where virtual buttons, mouse buttons or

278

B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283

modier keys change between object and view transformations.


While zooming and panning cover the cameras three translational
DOF, the third rotational DOF, camera roll, is less essential since the
camera up vector usually stays orthogonal to a scene ground plane.
While in desktop environments this DOF separation is mainly
owed to low-DOF input devices it can also be employed on devices
that allow more integrated transformation techniques, in order to
allow more precise control [46]. We opt for separated control of
camera parameters to enable precise view adjustments.
4.1.2. Integration
Multi-touch interactive surfaces provide two control DOF per
contact. The combination of multiple points can be used to create
integrated controls for 2D and 3D rotation and translation. Yet
Martinet et al. [43] point out that multi-touch-based surface
interaction cannot truly support integrated 6DOF control. They
propose the depth-separated screen-space (DS3) technique which
allows translation separate from orientation. Like the Sticky Tools
technique of Hancock et al. [28], the number of ngers and where
they touch the target (direct) or not (indirect) determines the
control mode. Full 3D control can also be achieved by additive
motion layering: changing the control-display mapping (e.g. by
navigating the view) between takes allows control of further
target DOF.
Other important factors for efciency are easy switching
between capture and view operations and dedicating hands to
tasks. This requires that a single hand be able to activate different
input modes with as little effort as possible. Widgets as an obvious
solution produce clutter and interfere with performance controls
that already require visual handles. Modal distinction by on- or
off-target hit testing can be problematic if the target has unusual
shape or dimensions. In order to separate between capture and
view control, we employ multi-nger chording in which the number of ngers switch between modes.
4.1.3. Correspondence
Interactive surfaces promote motor and perceptual correspondence between input and output. However, this correspondence
is difcult to maintain when planar input space and higher-dimensional parameter space have to be matched. For a start, users only
interact with two-dimensional projections of three-dimensional
data. For instance, to translate a handle in the screen z-dimension,
one cannot perform the equivalent motion with standard sensing
hardware. The problem with the third dimension on interactive
surfaces is that barring above-the-surface input, manipulations in
the screen z dimension cannot maintain this correspondence, since
input motions can only occur in a plane. Following the integrality
of touch input, this means that the 2 input DOF need to be mapped
to 2 translation parameters of the target (e.g. the handle of a character rig) so that they follow the same trajectory.
4.1.4. Metaphor
The congruent input and output space of direct input devices
promotes a manipulation style of interaction. Most manipulation
techniques for interactive surfaces are kinematic mappings, where
individual surface contacts exert a pseudo friction force by sticking
to objects or pinning them down. As an alternative to kinematic
control, Cao et al. [7] and Wilson et al. [63] propose surface-based
manipulation through virtual forces. This offers a more comprehensive and realistic simulation of physical forces and is also used
in desktop-based and immersive virtual environments. Different
metaphors in the same system can enhance the distinction
between controls that otherwise have much in common. For
instance, in the example of desktop 3D interaction, editing usually
employs the direct or instrumented interaction metaphors, while
view controls bear more resemblance to steering. This could also

support the mental distinction between phenomenologically similar spatial editing and navigation operations on interactive
surfaces.
Manipulation is the most general metaphor for puppet control.
Through manipulation the puppeteer can exibly create and
release mappings with a drag-and-drop style of interaction, directness minimizes mediation between user and target domain. For
complex transformations, as is often necessary in character animation, rigs should be designed so that handles promote as direct a
manipulation as possiblemeaning that handles should be colocated with the features they inuence and the handle-feature
mapping designed to support maximal correspondence. Regarding
kinematic versus physics-based manipulation mappings, realism
and emergent control styles stand against precision, predictability
and reliability. In animation, full control has a higher priority than
realism, which is why we opt for purely kinematic controls.
4.1.5. Directness
Interactive surfaces can reduce the distance between the user
and the target to a minimum. However, touch input also has potential disadvantages such as imprecision (when mapping the nger
contact area to a single point) and occlusion of on-screen content
through the users ngers, hands and arms [62]. Re-introducing
indirection can alleviate the occlusion problem. Since absolute
input techniques require to reach every part of the screen which
may become difcult when the display exceeds a certain size, limiting the area of interaction to a part of the screen or indirection
mechanisms can help [18]. The spatial distance between input
and target can also be used as a parameter for interaction design.
For instance, ngers or pens touching the target can control different DOF than off-target contacts (mode change). Layered motion
recording can involve manipulating moving targets after the initial
capture pass. Relative mapping applies transformation relative to
the initial input state. This allows arbitrary input location, and
clutching can increase the comfort of use. Both absolute and relative input can be applied locally and globally, which makes a signicant difference when controlling behavior of a feature that
inherits motion from its parents. Local mapping allows the user
to ignore motion of parent features and concentrate on local transformations. By default, performance control of a feature overwrites
any previous recordings made for it. In this way, performers can
practice and test a motion until they get it right. They might however want to keep aspects of an original recording and change others. Blending a performance with a previous recording expands the
possibilities for control. It allows performance-based editing of
existing animations.
4.1.6. Orchestration
Studies by Forlines et al. [19] and Kin et al. [38] demonstrated
that the benets of two-handed (symmetric) input also transfer
to interactive surfaces for basic selection and dragging tasks. The
difculty is to get users to use both hands, since single-handed
controls in typical UIs can prime them. To maximize the options,
our system should allow one-handed as well as symmetrical and
asymmetrical bimanual input. The 2D capture approach implicates
that no single spatial manipulation requires more than a single
hand. Consequentially, two single-handed operations can easily
be combined to enable parallel operation, for instance one hand
per character limb, allowing emergent asymmetric and symmetric
control (cf. [11]).
If individual sets of camera parameters are controlled with a
single hand, this allows emergent styles of interaction. Combining
two different camera operations, one with each hand, allows
asymmetric view control. For instance, left hand panning and right
hand zooming can be combined to simultaneous 3DOF view control. A combination of left-handed view control with right-handed

B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283

performance control even enables interaction styles that follow


principles of asymmetric bimanual behavior [25]: the left hand
can operate the view, which will be at a lower spatial and temporal
frequency and with precedence to the right hand, which acts in the
reference frame provided by the left. This approach can be used to
simplify view attaching for editing in dynamic reference frames:
attaching the camera to the current reference frame for all camera
operations provides the benets of kinaesthetic reference frames
and solves the issue of direct control with dynamic targets.
4.1.7. Spacetime
Direct-touch spatial editing is almost exclusively evaluated in the
scope of basic object editing in static environments (space ? space).
Non-spatial trigger input by tapping the screen (time ? time) is
commonly employed for discrete navigation of image sequences
or videos, e.g. TV sports presenters reviewing video recordings of a
game. With the exception of Moscovich et al. [45] and Kipp and
Nguyen [39], the potential of direct touch for motion capture
(spacetime ? spacetime) has received little attention in prior
research. Surface-specic techniques thus seem mainly aligned
along symmetric spacetime categories. The absence of passive,
time-based mappings or graphical depictions of time might be just
because the coupling of input and output so strongly affords direct,
continuous manipulation as opposed to tool use or automation.
While it is still pure conjecture, it is possible that direct-touch
promotes symmetric spacetime mappings which couple user and
medium space and time more literally, while indirect input might
be better suited for more mediated spacetime controls.
4.2. Prototype system
We implemented the design approach in a working prototype of
a multi-touch animation system (Walther-Franks et al. [70]). We
decided to build upon the existing 3D modelling and animation
software Blender. The animation system is built around a core of
performance controls. View controls and a time control interface
complete the basic functionality. Each control can be operated with
a single hand. This allows the user to freely combine two operations, e.g. capturing the motion of two features at once or wielding
the view and the puppet at the same time. Since Blender neither
supports multi-touch input nor concurrent operations, changes
were necessary to its user interface module, especially the event
system. We established a TUIO-based multi-touch interface. TUIO
is an open, platform independent framework that denes a common protocol and API for tangible interfaces and multi-touch surfaces [36]. It is based on the Open Sound Control (OSC) protocol, an
emerging standard for interactive environments. We implemented
chording techniques for mouse emulation by mapping multiple
nger cursors to single 2-DOF input events. This sufces for single-hand input. For bimanual interaction the contacts are clustered
using a spatial as well as a temporal threshold. Fingers are only
added to the gesture if they are within a certain distance of the
centroid of the gestures cursor cluster, otherwise they create a
new multi-nger gesture. After initial registration the gesture can
be relaxed, i.e. the nger constellation required for detection need
not be maintained during the rest of the continuous gesture. This
means that adding or removing a nger to the cluster will not
change the gesture, making continuous gestures resistant to tracking interruptions or touch pressure relaxation. This multi-touch
integration already enables the use of tools via multi-touch gestures with one hand at a time. For two-handed control it was necessary to extend the single pointer UI paradigm implemented in
Blender such that two input sources (two mice or two hands)
can operate independently and in parallel.
Performance controls use selection and translation operators
(Fig. 4). The translation operator works along the two axes dened

279

by the view plane. Single nger input maps to selection (tap) and
translation (drag). In linked feature hierarchies such as skeleton
rigs, the translation is applied to the distal bone end, rotating the
bone around screen z axis. Dragging directly on a target enables
selection and translation in a single uid motion. Alternatively,
the drag gesture can be performed anywhere on screen, also allowing indirect control of a prior selected target. Indirect dragging thus
requires prior selection to determine the input target. Selection is
the only context-dependent operator, as it determines the target
by ray casting from the tapped screen coordinates.
Layered animation is supported via absolute and additive mappings. Absolute mode is the standard mapping, additive mode
must be activated via the GUI. The standard absolute mapping
overwrites any previous transformation at the current time. In
the absence of parent motion this ensures 1:1 correspondence
between input and output. With parent motion, control becomes
relative to the parent frame of reference (local). Additive layering
preserves existing motion and adds the current relative transformation to it. By changing the view between takes so that the
inputoutput mapping affects degrees of freedom that could not
be affected in previous takes (e.g. by orbiting the view 90 degrees
around screen y), this enables the animator to add depth and thus
create more three-dimensional motion.
The three camera operators pan, orbit and zoom map to
two-, three-, and four-nger gestures (Fig. 5). Assigning chorded
multi-nger gestures to view operators does not have any precedent in the real world or prior work, and there are good arguments
for different choices. A sensible measure is the frequency of use of a
certain view control, and thus one could argue that the more commonly used functions should be mapped to the gestures with less
footprint, i.e. fewer ngers. Camera dolly move or zoom is probably
the least used view control, which is why we decided to map it to
the four nger gesture: users can zoom in and out by moving four
ngers up or down screen y. Three ngers allow camera orbit by
the turntable metaphor: movement along the screen x axis controls
turntable azimuth, while motion along screen y controls camera
altitude. Two ngers pan the view along view plane x and y axes.
Like transformation controls, camera controls are context-free,
meaning they can be activated anywhere on camera view.
A view attachment mode, when active, xes the view camera to
the currently selected feature during all camera operations, moving the camera along with dynamic targets (Fig. 6). The camerafeature offset is maintained and can be continuously altered
depending on camera operator as described above. After establishing the attachment by starting a view control gesture, new targets
can be selected and manipulated. Releasing the camera control
immediately ends the attachment, rendering the camera static.
By combining one-handed view control and capture in an asymmetric manner, this approach can solve indirection in control of
dynamic targets.
The time control interface features several buttons and a timeline. Simple play/pause toggle buttons start and stop the playback
within a specied time range. A timeline gives the animator visual
feedback on the remaining loop length in multi-track capture, supporting anticipation. It also enables efcient temporal navigation:
with a one-nger tap the animator can set the playhead to a specic frame. A continuous horizontal gesture allows for interactive
playback, allowing direct control of playback speed.

5. Evaluation
The design framework was a powerful aid for structuring design
options for the novel multi-touch animation system presented
above. We have also used it in the design of a performance-based
animation timing technique (Walther-Franks et al. [71]) and are

280

B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283

Fig. 4. Direct and indirect performance control.

Fig. 5. Basic view transformations with continuous multi-nger gestures.

B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283

281

Fig. 6. The view attaching technique. Features can inherit motion from parents animated in previous motion layers. In such cases direct control is not possible. By attaching
the view to the features frame of reference, direct control is reintroduced.

employing it in ongoing projects. A design framework as presented


in this paper cannot be directly evaluated. Its usefulness and
appropriateness is rather proven indirectly through evaluations
of prototypical systems built on its theoretical foundation. For this
reason we will next summarize the evaluation of the multi-touch
animation system.
We evaluated the resulting system in an informal user study.
Aspects of interest were the reception and use of single- and
multi-track capture and camera controls, specically in how far
two-handed interaction strategies would be employed. Since the
direct animation system has a high novelty and is still at prototype
stage, a formative evaluation was chosen in order to guide further
research. Formative evaluations are common in research and
development of 3D user interfaces [4]. Six right-handed individuals
aged between 23 and 31 years, four male, two female, took part in
our study. All came from a computer science and/or media production background. Two of these judged their skill level as frequent
users of animation software, one as an occasional user and three
as rarely using such software. In session of about 30 min, the users
did free animations of a stylized human puppet. An articulated
mannequin was rigged with seven handles that provided puppetry
controls (three bones for control of the body and four inverse kinematic handlers for hand and foot end effectors). The inverse kinematics handlers allowed expressive control of the multi-joint limbs
while keeping complexity at a minimum. The goal was to explore
what own animation goals users would come up with given the
digital puppet. The study ran the prototype on a rear-projected
horizontal interactive tabletop employing the diffuse illumination
technique with a height of 90 cm, screen diagonal of 52 inch and
a resolution of 1280  800 pixels.
The results of the study revealed that participants took to the
controls easily. Most stated that they enjoyed using our system.
The performance control interface was straightforward for initial
animations. Multi-track animation was mainly used to animate
separate features in multiple passes, less to adjust existing animation. The more complex additive mapping was hardly used and
met with initial confusion, although explanation and experimenting usually solved this. The view controls were quickly understood
and were used without difculty. The most commonly used camera operation was orbit. As all participants were familiar with
the timeline metaphor they had no problems understanding it.
Most subjects easily employed the absolute positioning of the
playhead to jump to a frame and to scrub along the timeline to
review the animation they had created. One participant used the
timeline for a method of animation somewhere between performance and frame-based animation: using the left hand for playhead and the right for pose control, he exerted a fast, efcient
pose-to-pose animation style. Five out of six participants manifested asymmetric bimanual styles of interaction. An emergent
strategy of half of our studys participants was to dedicate the left
hand for view or time controls and the right for capture. Further,
one participant controlled two puppet features simultaneously.
Three used their left hand to attach the view to the mannequin

for animating its limbs once they had created animation for the
root bone. The benet of locking the view to a frame of reference
in this way seemed immediately apparent to them, and was
greeted with enthusiasm in two cases.
Given the short timeframe and lack of experience in performance animation, participants were able to create surprisingly
rened character motion. Four were able to create expressive character animations within the short timeframe of 10 min in the free
animation task. These were a walk, jump and squat motions and
dance moves.
Inexperienced users had a harder time to comprehend spatial
relationships, while those with more experience in 3D animation
notably picked up controls more uently. This comes as no surprise, as using and controlling software takes time and practice,
regardless of interface. For novice and casual users, our 2DOF strategy seems appropriate, since it constrains manipulation by the
depth dimension. However, the interface might need improvement
visualizing these constraints and giving more hints on depth cues.

6. Conclusion and discussion


Current animation system are too complex and inefcient for
the high demand in animated content today. In order to make them
more efcient and accessible to a broad range of users we have to
look at such tools from an HCI perspective. Our work has taken
steps in this direction. A review summarized related work in computer animation interfaces regarding issues of control and use. A
design space characterized important aspects of animation interfaces on varying levels of abstraction. A taxonomy for spacetime
interactions with spatiotemporal media described how user and
medium space and dynamics relate in animation interfaces. The
use of this conceptual framework was demonstrated in the design
of a multi-touch animation system. For this proof-of-concept prototype we used interactive surfaces as high-bandwidth direct input
devices. It features robust, easy to understand, and conict-free
unimanual mappings for performance and view control that can
be combined for efcient bimanual interaction. A user study veried the design approach by showing largely positive user reactions.
The majority of users employed both hands in emergent asymmetric and symmetric bimanual interaction.
Animations are created by people for people in order to inform,
educate or entertain. Striving for higher usability by applying
knowledge on physiological and psychological human factors is
the foundation of humancomputer interaction, and one of the
main points of our work. However, animation is primarily still an
art and a craft. Just as good animations have always been created
by artists with capability and skill, next generation animation
interfaces will still require talent and training on behalf of the user.
But in contrast to current mainstream tools they can help to ease
the effort in training and allow animators to express their creativity more efciently. While animation tools cannot enable
completely uninitiated people to create stunning motion designs

282

B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283

without signicantly constraining creativity, they can do a lot more


to make the learning curve less steep. We believe that next generation tools should incorporate everyone from beginners to experienced professionals, by being easy to learn, but hard to master. In
this we hold it with voices in the community that, rather than
making systems easy to use, intend to accelerate the progress from
novices to experts [35], by letting users feel like naturals [62].
Acknowledgement
This work was funded in part by the Klaus Tschira Stiftung.
Appendix A. Supplementary data
Supplementary data associated with this article can be found, in
the online version, at http://dx.doi.org/10.1016/j.entcom.2014.
08.007.
References
[1] Anand Agrawala, Ravin Balakrishnan, Keepin it real: pushing the desktop
metaphor with physics, piles and the pen, in: Proceedings of the SIGCHI
Conference on Human Factors in Computing Systems, CHI 06, ACM, New York,
NY, USA, 2006, pp. 12831292.
[2] Michel Beaudouin-Lafon, Instrumental interaction: an interaction model for
designing post-WIMP user interfaces, in: Proceedings of the SIGCHI conference
on Human Factors in Computing Systems, CHI 00, ACM, New York, NY, USA,
2000, pp. 446453.
[3] B. Bodenheimer, C. Rose, S. Rosenthal, J. Pella, The process of motion capture:
dealing with the data, in: D. Thalmann, M. van de Panne (Eds.), Computer
Animation and Simulation 97. Eurographics/ACM SIGGRAPH, 1997.
[4] Doug.A. Bowman, Ernst. Kruijff, Joseph.J. LaViola, Ivan. Poupyrev, 3D User
Interfaces: Theory and Practice, Addison-Wesley, 2004.
[5] Brightside Games. Zeit2. Ubisoft, 2011.
[6] W. Buxton, B. Myers. A study in two-handed input, in: Proceedings of the
SIGCHI Conference on Human Factors in Computing Systems, CHI 86, ACM,
New York, NY, USA, 1986, pp. 321326.
[7] Xiang Cao, Andrew D. Wilson, Ravin Balakrishnan, Ken Hinckley, Scott E.
Hudson, ShapeTouch: leveraging contact shape on interactive surfaces, in:
2008 IEEE International Workshop on Horizontal Interactive Human Computer
Systems (TABLETOP). IEEE, October 2008, pp. 129136.
[8] S. K. Card, J. Mackinlay. The structure of the information visualization design
space, in: Information Visualization, 1997. Proceedings., IEEE Symposium on ,
volume 0, IEEE, Los Alamitos, CA, USA, October 1997, pp. 9299.
[9] Stuart.K. Card, Jock.D. Mackinlay, George.G. Robertson, A morphological
analysis of the design space of input devices, ACM Trans. Inf. Syst. 9 (2)
(April 1991) 99122.
[10] Daniel. Casasanto, Lera. Boroditsky, Time in the mind: using space to think
about time, Cognition 106 (2) (February 2008) 579593.
[11] Lawrence D. Cutler, Bernd Frhlich, Pat Hanrahan. Two-handed direct
manipulation on the responsive workbench, in: SI3D 97: Proceedings of the
1997 Symposium on Interactive 3D Graphics, ACM, New York, NY, USA, 1997,
pp. 107114.
[12] J.D.N. Dionisio, A.F. Cardenas, A unied data model for representing
multimedia, timeline, and simulation data, IEEE Trans. Knowledge Data Eng.
10 (5) (September 1998) 746767.
[13] Mira. Dontcheva, Gary. Yngve, Zoran. Popovic, Layered acting for character
animation, ACM Trans. Graph. 22 (3) (July 2003) 409416.
[14] Tanja Dring, Axel Sylvester, Albrecht Schmidt. A design space for ephemeral
user interfaces, in: Proceedings of the 7th International Conference on
Tangible, Embedded and Embodied Interaction, TEI 13, ACM, New York, NY,
USA, 2013, pp. 7582.
[15] Pierre Dragicevic, Gonzalo Ramos, Jacobo Bibliowitcz, Derek Nowrouzezahrai,
Ravin Balakrishnan, Karan Singh. Video browsing by direct manipulation, in:
Proceedings of the SIGCHI Conference on Human Factors in Computing
Systems, CHI 08, ACM, New York, NY, USA, 2008, pp. 237246.
[16] J. Edelmann, A. Schilling, S. Fleck. The DabR a multitouch system for intuitive
3D scene navigation, in: 3DTV Conference. The True Vision Capture,
Transmission and Display of 3D Video, 2009. IEEE, May 2009, pp. 14.
[17] James.D. Foley, Andries. van Dam, Steven.K. Feiner, John F. Hughes, Computer
Graphics Principles and Practice, Addison-Wesley, 1996.
[18] Clifton Forlines, Daniel Vogel, Ravin Balakrishnan. Hybrid Pointing: uid
switching between absolute and relative pointing with a direct input device,
in: UIST 06: Proceedings of the 19th Annual ACM Symposium on User
Interface Software and Technology, ACM, New York, NY, USA, 2006, pp. 211
220.
[19] Clifton Forlines, Daniel Wigdor, Chia Shen, Ravin Balakrishnan. Direct-touch
vs. mouse input for tabletop displays, in: Proceedings of the SIGCHI Conference
on Human Factors in Computing Systems, CHI 07, ACM, New York, NY, USA,
2007, pp. 647656.

[20] B. Frohlich, H. Tramberend, A. Beers, M. Agrawala, D. Baraff. Physically-based


manipulation on the responsive workbench, in: IEEE Virtual Reality 2000,
volume 0, IEEE Comput. Soc., Los Alamitos, CA, USA, 2000, pp. 511.
[21] David M. Frohlich, The design space of interfaces, in: Lars. Kjelldahl (Ed.),
Multimedia, Eurographic Seminars, Springer, Berlin Heidelberg, 1992, pp. 53
69.
[22] DavidM Frohlich, Direct manipulation and other lessons, in: Martin G. Helander, Thomas K. Landauer, Prasad V. Prabhu (Eds.), Handbook of Human
Computer Interaction, Elsevier, North-Holland, 1997, pp. 463488.
[23] Chi W. Fu, Wooi B. Goh, Junxiang A. Ng. Multi-touch techniques for exploring
large-scale 3D astrophysical simulations, in: Proceedings of the 28th
international conference on Human factors in computing systems, CHI 10,
ACM, New York, NY, USA, 2010, pp. 22132222.
[24] Michael. Gleicher, Animation from observation: motion capture and motion
editing, SIGGRAPH Comput. Graph. 33 (4) (November 1999) 5154.
[25] Y. Guiard, Asymmetric division of labor in human skilled bimanual action: the
kinematic chain as a model, J. Motor Behav. 19 (4) (December 1987) 486517.
[26] Marc S. Hancock, F. D. Vernier, Daniel Wigdor, Sheelagh Carpendale, and Chia
Shen. Rotation and translation mechanisms for tabletop interaction, in:
Horizontal Interactive HumanComputer Systems, 2006. TableTop 2006.
First IEEE International Workshop on, 8 pp+. IEEE, January 2006.
[27] Mark Hancock, Sheelagh Carpendale, Andy Cockburn. Shallow-depth 3d
interaction: design and evaluation of one-, two- and three-touch techniques,
in: Proceedings of the SIGCHI Conference on Human Factors in Computing
Systems, CHI 07, ACM, New York, NY, USA, 2007, pp. 11471156.
[28] Mark Hancock, Thomas T. Cate, Sheelagh Carpendale. Sticky tools: Full 6DOF
force-based interaction for multi-touch tables, in: Proceedings of Interactive
Tabletops and Surfaces 2009, 2009.
[29] Ken. Hinckley, Daniel. Wigdor, Input Technologies and Techniques, Taylor &
Francis, 2012. Chapter 9.
[30] Takeo. Igarashi, Tomer. Moscovich, John.F. Hughes, As-rigid-as-possible shape
manipulation, ACM Trans. Graph. 24 (3) (2005) 11341141.
[31] Satoru Ishigaki, Timothy White, Victor B. Zordan, C. Karen Liu, Performancebased control interface for character animation, ACM Trans. Graph. 28 (3)
(2009) 18. July.
[32] Robert J.K. Jacob, Linda E. Sibert, Daniel C. McFarlane, M. Preston Mullen,
Integrality and separability of input devices, ACM Trans. Comput. Hum.
Interact. 1 (1) (1994) 326. March.
[33] Robert J. K. Jacob, Audrey Girouard, Leanne M. Hirsheld, Michael S. Horn, Orit
Shaer, Erin T. Solovey, Jamie Zigelbaum. Reality-based interaction: a
framework for post-WIMP interfaces, in: Proceedings of the Twenty-sixth
Annual SIGCHI Conference on Human Factors in Computing Systems, CHI 08,
ACM, New York, NY, USA, 2008, pp. 201210.
[34] John Jurgensen. From muppets to digital puppets, August 2008. URL http://
www.youtube.com/watch?v=GN8WbHomQJg.
[35] Paul Kabbash, William Buxton, Abigail Sellen. Two-handed input in a
compound task, in: Proceedings of the SIGCHI Conference on Human Factors
in Computing Systems, CHI 94, ACM, New York, NY, USA, 1994, pp. 417423.
[36] Martin Kaltenbrunner, Till Bovermann, Ross Bencina, Enrico Costanza. TUIO
a protocol for table based tangible user interfaces, in: Proceedings of the 6th
International Workshop on Gesture in HumanComputer Interaction and
Simulation (GW 2005), Vannes, France, 2005.
[37] Matt Kelland, Dave Morris, Dave Lloyd Machinima, Making Animated Movies
in 3D Virtual Environments, Ilex, Lewes, 2005.
[38] Kenrick Kin, Maneesh Agrawala, Tony DeRose. Determining the benets of
direct-touch, bimanual, and multinger input on a multitouch workstation, in:
Proceedings of Graphics Interface 2009, GI 09, Canadian Information
Processing Society, Toronto, Ontario, Canada, Canada, 2009, pp. 119124.
[39] Michael Kipp, Quan Nguyen. Multitouch puppetry: creating coordinated 3D
motion for an articulated arm, in: ACM International Conference on Interactive
Tabletops and Surfaces, ITS 10, ACM, New York, NY, USA, 2010, pp. 147156.
[40] Joseph Laszlo, Michiel van de Panne, Eugene Fiume. Interactive control for
physically-based animation, in: SIGGRAPH 00: Proceedings of the 27th Annual
Conference on Computer Graphics and Interactive Techniques, ACM Press/
Addison-Wesley Publishing Co., New York, NY, USA, 2000, pp. 201208.
[41] Andrea Leganchuk, Shumin Zhai, William Buxton, Manual and cognitive
benets of two-handed input: an experimental study, ACM Trans. Comput.
Hum. Interact. 5 (4) (1998) 326359. December.
[42] Thomas D.C. Little, in: Time-based Media Representation and Delivery, ACM
Press/Addison-Wesley Publishing Co., New York, NY, USA, 1994, pp. 175200.
[43] A. Martinet, G. Casiez, L. Grisoni, Integrality and separability of multitouch
interaction techniques in 3D manipulation tasks, IEEE Trans. Vis. Comput.
Graph. 18 (3) (March 2012) 369380.
[44] Alberto Menache. Understanding motion capture for computer animation.
2011.
[45] T. Moscovich, T. Igarashi, J. Rekimoto, K. Fukuchi, J. F. Hughes. A multi-nger
interface for performance animation of deformable drawings, in: UIST 2005
Symposium on User Interface Software and Technology, October 2005.
[46] Miguel A. Nacenta, Patrick Baudisch, Hrvoje Benko, Andrew D. Wilson.
Separability of spatial manipulations in multi-touch interfaces, in: GI 09:
Proceedings of Graphics Interface 2009, Canadian Information Processing
Society, Toronto, Ontario, Canada, Canada, 2009, pp. 175182.
[47] DennisC Neale, JohnM Carroll, in: The Role of Metaphors in User Interface
Design, 2nd ed., Elsevier, Amsterdam, North-Holland, 1997, pp. 441462.
[48] Laurence Nigay, Joelle Coutaz. A design space for multimodal systems:
concurrent processing and data fusion, in: Proceedings of the INTERACT 93

B. Walther-Franks, R. Malaka / Entertainment Computing 5 (2014) 271283

[49]
[50]

[51]

[52]

[53]

[54]

[55]

[56]
[57]

[58]
[59]

and CHI 93 Conference on Human Factors in Computing Systems, CHI 93,


ACM, New York, NY, USA, 1993, pp. 172178.
Number None, Inc., Braid. Microsoft Game Studios, 2008.
Sageev Oore, Demetri Terzopoulos, Geoffrey Hinton, A desktop input device
and interface for interactive 3D character animation, in: Proc. Graphics
Interface, 2002, pp. 133140.
Jovan. Popovic, Steven.M. Seitz, Michael. Erdmann, Motion sketching for
control of rigid-body simulations, ACM Trans. Graph. 22 (4) (October 2003)
10341054.
Jason L. Reisman, Philip L. Davidson, Jefferson Y. Han. A screen-space
formulation for 2D and 3D direct manipulation, in: Proceedings of the 22nd
annual ACM symposium on User interface software and technology, UIST 09,
ACM, New York, NY, USA, 2009, pp. 6978.
Johannes Schning, Frank Steinicke, Antonio Krger, Klaus Hinrichs, and
Dimitar Valkov. Bimanual interaction with interscopic Multi-Touch surfaces,
in: David Hutchison, Takeo Kanade, Josef Kittler, Jon M. Kleinberg, Friedemann
Mattern, John C. Mitchell, Moni Naor, Oscar Nierstrasz, C. Pandu Rangan,
Bernhard Steffen, Madhu Sudan, Demetri Terzopoulos, Doug Tygar, Moshe Y.
Vardi, Gerhard Weikum, Tom Gross, Jan Gulliksen, Paula Kotz e, Lars
Oestreicher, Philippe Palanque, Raquel O. Prates, Marco Winckler (Eds.),
HumanComputer Interaction INTERACT 2009, volume 5727 of Lecture Notes
in Computer Science, Chapter 8. Springer, Berlin Heidelberg, Berlin,
Heidelberg, 2009, pp. 4053.
Christian Schumacher, Bernhard Thomaszewski, Stelian Coros, Sebastian
Martin, Robert Sumner, Markus Gross. Efcient simulation of example-based
materials, in: Proceedings of the ACM SIGGRAPH/Eurographics Symposium on
Computer Animation, SCA 12, Aire-la-Ville, Eurographics Association,
Switzerland, Switzerland, 2012, pp. 18.
Yeongho Seol, Carol OSullivan, Jehee Lee. Creature features: online motion
puppetry for non-human characters, in: Proceedings of the 12th ACM
SIGGRAPH/Eurographics Symposium on Computer Animation, SCA 13, ACM,
New York, NY, USA, 2013, pp. 213221.
Rajvi Shah, P.J. Narayanan. Trajectory based video object manipulation, in: EEE
International Conference on Multimedia and Expo 2011 (ICME 2011), 2011.
Hyun.J. Shin, Jehee. Lee, Sung.Y. Shin, Michael. Gleicher, Computer puppetry:
An importance-based approach, ACM Trans. Graph. 20 (2) (April 2001)
6794.
Scott.S. Snibbe, A direct manipulation interface for 3D computer animation,
Comput. Graph. Forum 14 (3) (August 1995) 271283.
David J. Sturman, Computer puppetry, Comput. Graph Entertainment (1998).
February.

283

[60] Ubisoft. Prince of Persia: Sands of Time. Ubisoft, Sony Computer


Entertainment, 2003.
[61] Anna Vgele, Max Hermann, Bjrn Krger, Reinhard Klein. Interactive steering
of mesh animations, in: Proceedings of the ACM SIGGRAPH/Eurographics
Symposium on Computer Animation, SCA12, Aire-la-Ville, Eurographics
Association, Switzerland, 2012, pp. 5358.
[62] Daniel Wigdor, Dennis Wixon. Brave NUI World: Designing Natural User
Interfaces for Touch and Gesture. Morgan Kaufmann, 1st ed., April 2011.
[63] Andrew D. Wilson, Shahram Izadi, Otmar Hilliges, Armando G. Mendoza, David
Kirk. Bringing physics to the surface, in: UIST08: Proceedings of the 21st
Annual ACM Symposium on User Interface Software and Technology, ACM,
New York, NY, USA, 2008, pp. 6776.
[64] Katsu. Yamane, Yoshihiko. Nakamura, Natural motion animation through
constraining and deconstraining at will, IEEE Trans. Vis. Comput. Graph. 9 (3)
(2003) 352360.
[65] Shumin Zhai, Paul Milgram, Quantifying coordination in multiple DOF
movement and its application to evaluating 6 DOF input devices, in: CHI98:
Proceedings of the SIGCHI Conference on Human Factors in Computing
Systems, ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 1998,
pp. 320327.
[66] K. Yamane, Y. Ariki, J. Hodgins, Animating non-humanoid characters with
human motion data, in: Proceeding of the 2010 ACM SIGGRAPH/Eurographics
Symposium on Computer Animation, 2010, pp. 169178.
[67] A. Sanna, F. Lamberti, G. Paravati, G. Carlevaris, P. Montuschi, Automatically
mapping human skeletons onto virtual character armatures, in: 5th
International Conference on Intelligent Technologies for Interactive
Entertainment (INTETAIN2013), Mons, Belgium, July 35, 2013. pp. 8089.
[68] Ken Hinckley, Randy Pausch, Dennis Proftt, Neal F. Kassell, Two-handed virtual
manipulation, ACM Trans. Comput. Hum. Interact. 5 (3) (1998) 260302.
[69] Ravin Balakrishnan, Gordon Kurtenbach, Exploring bimanual camera control
and object manipulation in 3D graphics interfaces. in: Proceedings of the
SIGCHI Conference on Human Factors in Computing Systems, CHI 99, ACM,
New York, NY, USA, 1999, pp. 5662.
[70] Benjamin Walther-Franks, Marc Herrlich, Rainer Malaka, A Multi-Touch
system for 3D modelling and animation, in: Proceedings of the International
Symposium on Smart Graphics, Springer-Verlag, Berlin, Heidelberg, 2011.
[71] Benjamin Walther-Franks, Marc Herrlich, Thorsten Karrer, Moritz
Wittenhagen, Roland Schrder-Kroll, Rainer Malaka, Jan Borchers,
DRAGIMATION: Direct manipulation keyframe timing for performance-based
animation, in: Proceedings of Graphics Interface, GI 12, Canadian Information
Processing Society, Toronto, Ontario, Canada, 2012.

S-ar putea să vă placă și