Documente Academic
Documente Profesional
Documente Cultură
Entertainment Computing
journal homepage: ees.elsevier.com/entcom
a r t i c l e
i n f o
Article history:
Received 31 March 2014
Revised 15 July 2014
Accepted 19 August 2014
Available online 2 September 2014
Keywords:
Motion design interfaces
Performance animation
Humancomputer interaction
Design space
a b s t r a c t
Design of and research on animation interfaces rarely uses methods and theory of humancomputerinteraction (HCI). Graphical motion design interfaces are based on dated interaction paradigms, and novel
procedures for capturing, processing and mapping motion are preoccupied with aspects of modeling and
computation. Yet research in HCI has come far in understanding human cognition and motor skills and
how to apply this understanding to interaction design. We propose an HCI perspective on computer animation that relates the state-of-the-art in motion design interfaces to the concepts and terminology of
this eld. The main contribution is a design space of animation interfaces. This conceptual framework
aids relating strengths and weaknesses of established animation methods and techniques. We demonstrate how this interaction-centric approach can be put into practice in the development of a multi-touch
animation system.
2014 Elsevier B.V. All rights reserved.
1. Introduction
Moving images are omnipresent in cinema, television, computer games and online entertainment. Digital media such as text,
images and lm are nowadays produced by a diverse crowd of
authors, ranging from beginners and laymen to professionals. Yet
animation is still seen by most people as a highly sophisticated
process that only experts can master, using complex interfaces
and expensive equipment. However, consumer motion capture
technology has recently enabled and created a mass-market for
easy-to-use animation tools: computer games. In contrast to most
professional animation tools, recent games employ full-body interaction for instance via Kinect, allowing users to control a virtual
character instantaneously through their body. This trend is feeding
back into the area of the experts, with researchers investigating
time-efcient interfaces for computer puppetry using the Kinect
(e.g. [61,55]. Computer animation is currently seeing an inux of
ideas coming from the world of easy-to-use game interface made
for players with no prior training. Game designers in turn are
informed by design knowledge and methods developed over
decades of research in humancomputer interaction (HCI).
It is thus time that computer animation be approached from an
HCI perspective. This could aid describing and analyzing the vast
spectrum of animation techniques ranging from very intuitive puppetry interfaces for computer games to highly sophisticated control in advanced animation tools. Our goal is to understand
principles that underlie humanmachine interactions in computer
animation. With new ways of thinking about interactions with
continuous visual media and a thorough investigation of new animation interfaces on a theoretical foundation, motion design interfaces can be made more beginner and expert friendly.
This can be achieved by embedding computer animation methods and interfaces in an HCI context. Trends in motion design
interfaces can be connected with discussions on next generation
interfaces in HCI. Theoretical frameworks can aid us in tackling
the concrete user interface issues by a profound analysis, which
can aid the process of designing new mechanisms for more natural
and intuitive means of motion creation and editing.
This article approaches this goal in three main steps. We will
rst review related work from computer graphics, human computer
interaction and entertainment computing from a user- and interface-centric perspective with a focus on methods, mappings and
metaphors. In the second step we construct a design space for interfaces that deal with spatiotemporal media. In the third step, the
utility of this conceptual framework is illustrated by applying it in
designing a multi-touch interactive animation system.
2. Animation techniques: an interaction view
http://dx.doi.org/10.1016/j.entcom.2014.08.007
1875-9521/ 2014 Elsevier B.V. All rights reserved.
Computer-based frame animation is the direct successor of traditional hand-drawn animation, and still the main method.
Advances in sensing hardware and processing power have brought
272
entirely new possibilities. Motion capture records the live performance of actors, introducing a new form of animation more akin
to puppetry than traditional animation. Programmed animation
enables realistic simulations to provide interesting secondary
motion and create more believable worlds.
Traditionally, in computer-based keyframe animation, only
extreme poses or key frames need to be manually established by
the animator. Each keyframe is edited using manipulation tools,
which can be specialized for the target domain, e.g. character
poses. Some manipulation tools allow inuencing dynamics
directly in the scene view. The most common means of specifying
dynamics is by using global descriptions, such as time plots or
motion paths. Spatial editing between keyframes can be achieved
indirectly by editing interpolation functions or by dening a new
key pose.
Motion timing is usually done via global descriptions of dynamics. However, some temporal control techniques directly operate
on the target. Snibbe [58] suggests timing techniques that do not
require time plots but can be administered by directly manipulating the target or its motion path in the scene view. As with spatial
editing, the practicality of temporal editing with displacement
functions depends heavily on the underlying keyframe distribution. Timing by direct manipulation in the scene view is also supported by the latest animation software packages. Tweaking
motion trail handles allows for temporal instead of spatial translation; visual feedback can be given by changing frame numbers
adjacent to the handle. Spatial control of time has also been proposed for video navigation [15].
Motion graphs are two-dimensional plots that map transformation values (vertical axis) against time (horizontal axis). With a
2DOF input device, such a graph thus allows integrated, simultaneous spatiotemporal control. In keyframe animation the motion
editor is the standard way to manage keyframe value interpolation,
typically by means of Bezier curve handles.
In contrast to keyframe animation, performance animation uses
motion capturing of live performance of an actor or puppeteer by
tracking a number of key points in space over time and combining
them to obtain a representation of the performance. The recorded
data then drives the motion of a digital character. The entire procedure of applying motion capture data to drive an animation is
referred to as performance animation [44]. In a typical setup, an
actors motion is rst recorded, then the data is cleaned, processed
and applied to a digital character. Since the digital character can
have quite different proportions than the performer, retargeting
the motion data is a non-trivial task [24]. In this form of performance animation, capture and application of motion data to an
animation are two separate processes, data handling is done offline. Online performance animation immediately applies captured
data to a digital character, creating animation instantly, allowing
the performer to react immediately to the results or to interact
with an audience [59,24]. Processing limitations sometimes entail
that performers can often only see a low-delity pre-visualization
of the nal rendering [44].
Many performance animation efforts aim to represent human
motion accurately and limit the abstraction to a minimum and
the motion capture performers use only the senses with which
they have learned to act (e.g. kinaesthetic and proprioceptive feedback). For performance animation of stylized or non-humanoid
characters it is desirable to control them in a less literal fashion.
Such a style of performance control is often referred to as computer
or digital puppetry [3,59]. Just as traditional puppeteers would rely
on mirrors or camera feeds to adjust their performance, computer
puppetry requires instant renderings of the applied input to allow
performers to adjust their motions. Real-time mappings either use
high bandwidth devices for coordinated control of all character
DOF, or employ models based on example data or a physical
273
274
Fig. 1. The design space of animation interfaces characterizes the entities involved
in the interaction and their relations.
275
space ? space
space ? time
time ? space
time ? time
Often presentation space and time will be modied in an integrated fashion, or spatial and temporal control will both gure into
the inputoutput relation. For this we introduce two control-integrated spacetime categories that cover inputoutput mappings in
which both control dimensions contribute to the relation
spacetime ? space (i.e., space ? space and time ? space)
spacetime ? time (i.e., space ? time and time ? time)
and two presentation-integrated spacetime categories in which
both presentation dimensions are affected by the interaction:
space ? spacetime (space ? space and space ? time)
time ? spacetime (time ? space, time ? time)
The nal cases are the fully integrated spacetime categories
spacetime ? spacetime (space ? space and time ? time)
spacetime ? timespace (space ? time and time ? space)
which reect that integrated control dimensions affecting presentation domains in an integrated way can be matched in two ways.
These ten spacetime categories cover all variants of mapping user
spacetime to medium spacetime. A simple means of visualizing
this is a 3 3 matrix, where the central cell is compartmented into
two, since relating both control and presentation space and time is
ambiguous (Fig. 2).
The rst row of the matrix describes control mappings that only
look at the spatial component of the input and do not consider the
timing of the users input. The third row describes control
mappings where input has no spatial component, and the user only
administers state changes with temporal triggers via controls
such as buttons. The second row describes control mappings where
spatial input stands in a temporal context. There are borderline
276
Space
Space
Space-Time
Space
Time
Motion Editing
Graph Editor
Time Control
Timeline Bar
Scenario
Adjusting ease-in/ease-out
Browsing a video
Space
Space-Time
Applications Interactive Travel in Static Virtual Environments
Techniques Steering
Browsing a 3D information space
Scenario
Space-Time
Space-Time
Performance Animation, Video Games
Space-Time
Time Control
Computer Puppetry
Jog Shuttle
Animating a character
Browsing a video
Space
Time
Applications Passive Travel in Static Virtual Environments
Techniques Target-based Navigation/Fly-Throughs
Exploring architectural models
Scenario
Space-Time
Time
Passive Travel in Dynamic Virtual Environments
Target-based Navigation/Fly-Throughs
Watching a cut-scene in a 3D video game
Time
Time
Playback
Triggers/Buttons
Watching a video
Posing a character
Time
Fig. 3. Nine categories of spacetime mappings with example applications, techniques and scenarios of use. (Figure contains cropped stills of third party material licensed
under CC BY 3.0. Top left, top right and bottom left images attributed to the Durian Blender Open Movie Project; bottom left image attributed to Frontop Technology Co., Ltd;
bottom center image attributed to Apricot Blender Open Game Project).
277
278
support the mental distinction between phenomenologically similar spatial editing and navigation operations on interactive
surfaces.
Manipulation is the most general metaphor for puppet control.
Through manipulation the puppeteer can exibly create and
release mappings with a drag-and-drop style of interaction, directness minimizes mediation between user and target domain. For
complex transformations, as is often necessary in character animation, rigs should be designed so that handles promote as direct a
manipulation as possiblemeaning that handles should be colocated with the features they inuence and the handle-feature
mapping designed to support maximal correspondence. Regarding
kinematic versus physics-based manipulation mappings, realism
and emergent control styles stand against precision, predictability
and reliability. In animation, full control has a higher priority than
realism, which is why we opt for purely kinematic controls.
4.1.5. Directness
Interactive surfaces can reduce the distance between the user
and the target to a minimum. However, touch input also has potential disadvantages such as imprecision (when mapping the nger
contact area to a single point) and occlusion of on-screen content
through the users ngers, hands and arms [62]. Re-introducing
indirection can alleviate the occlusion problem. Since absolute
input techniques require to reach every part of the screen which
may become difcult when the display exceeds a certain size, limiting the area of interaction to a part of the screen or indirection
mechanisms can help [18]. The spatial distance between input
and target can also be used as a parameter for interaction design.
For instance, ngers or pens touching the target can control different DOF than off-target contacts (mode change). Layered motion
recording can involve manipulating moving targets after the initial
capture pass. Relative mapping applies transformation relative to
the initial input state. This allows arbitrary input location, and
clutching can increase the comfort of use. Both absolute and relative input can be applied locally and globally, which makes a signicant difference when controlling behavior of a feature that
inherits motion from its parents. Local mapping allows the user
to ignore motion of parent features and concentrate on local transformations. By default, performance control of a feature overwrites
any previous recordings made for it. In this way, performers can
practice and test a motion until they get it right. They might however want to keep aspects of an original recording and change others. Blending a performance with a previous recording expands the
possibilities for control. It allows performance-based editing of
existing animations.
4.1.6. Orchestration
Studies by Forlines et al. [19] and Kin et al. [38] demonstrated
that the benets of two-handed (symmetric) input also transfer
to interactive surfaces for basic selection and dragging tasks. The
difculty is to get users to use both hands, since single-handed
controls in typical UIs can prime them. To maximize the options,
our system should allow one-handed as well as symmetrical and
asymmetrical bimanual input. The 2D capture approach implicates
that no single spatial manipulation requires more than a single
hand. Consequentially, two single-handed operations can easily
be combined to enable parallel operation, for instance one hand
per character limb, allowing emergent asymmetric and symmetric
control (cf. [11]).
If individual sets of camera parameters are controlled with a
single hand, this allows emergent styles of interaction. Combining
two different camera operations, one with each hand, allows
asymmetric view control. For instance, left hand panning and right
hand zooming can be combined to simultaneous 3DOF view control. A combination of left-handed view control with right-handed
279
by the view plane. Single nger input maps to selection (tap) and
translation (drag). In linked feature hierarchies such as skeleton
rigs, the translation is applied to the distal bone end, rotating the
bone around screen z axis. Dragging directly on a target enables
selection and translation in a single uid motion. Alternatively,
the drag gesture can be performed anywhere on screen, also allowing indirect control of a prior selected target. Indirect dragging thus
requires prior selection to determine the input target. Selection is
the only context-dependent operator, as it determines the target
by ray casting from the tapped screen coordinates.
Layered animation is supported via absolute and additive mappings. Absolute mode is the standard mapping, additive mode
must be activated via the GUI. The standard absolute mapping
overwrites any previous transformation at the current time. In
the absence of parent motion this ensures 1:1 correspondence
between input and output. With parent motion, control becomes
relative to the parent frame of reference (local). Additive layering
preserves existing motion and adds the current relative transformation to it. By changing the view between takes so that the
inputoutput mapping affects degrees of freedom that could not
be affected in previous takes (e.g. by orbiting the view 90 degrees
around screen y), this enables the animator to add depth and thus
create more three-dimensional motion.
The three camera operators pan, orbit and zoom map to
two-, three-, and four-nger gestures (Fig. 5). Assigning chorded
multi-nger gestures to view operators does not have any precedent in the real world or prior work, and there are good arguments
for different choices. A sensible measure is the frequency of use of a
certain view control, and thus one could argue that the more commonly used functions should be mapped to the gestures with less
footprint, i.e. fewer ngers. Camera dolly move or zoom is probably
the least used view control, which is why we decided to map it to
the four nger gesture: users can zoom in and out by moving four
ngers up or down screen y. Three ngers allow camera orbit by
the turntable metaphor: movement along the screen x axis controls
turntable azimuth, while motion along screen y controls camera
altitude. Two ngers pan the view along view plane x and y axes.
Like transformation controls, camera controls are context-free,
meaning they can be activated anywhere on camera view.
A view attachment mode, when active, xes the view camera to
the currently selected feature during all camera operations, moving the camera along with dynamic targets (Fig. 6). The camerafeature offset is maintained and can be continuously altered
depending on camera operator as described above. After establishing the attachment by starting a view control gesture, new targets
can be selected and manipulated. Releasing the camera control
immediately ends the attachment, rendering the camera static.
By combining one-handed view control and capture in an asymmetric manner, this approach can solve indirection in control of
dynamic targets.
The time control interface features several buttons and a timeline. Simple play/pause toggle buttons start and stop the playback
within a specied time range. A timeline gives the animator visual
feedback on the remaining loop length in multi-track capture, supporting anticipation. It also enables efcient temporal navigation:
with a one-nger tap the animator can set the playhead to a specic frame. A continuous horizontal gesture allows for interactive
playback, allowing direct control of playback speed.
5. Evaluation
The design framework was a powerful aid for structuring design
options for the novel multi-touch animation system presented
above. We have also used it in the design of a performance-based
animation timing technique (Walther-Franks et al. [71]) and are
280
281
Fig. 6. The view attaching technique. Features can inherit motion from parents animated in previous motion layers. In such cases direct control is not possible. By attaching
the view to the features frame of reference, direct control is reintroduced.
for animating its limbs once they had created animation for the
root bone. The benet of locking the view to a frame of reference
in this way seemed immediately apparent to them, and was
greeted with enthusiasm in two cases.
Given the short timeframe and lack of experience in performance animation, participants were able to create surprisingly
rened character motion. Four were able to create expressive character animations within the short timeframe of 10 min in the free
animation task. These were a walk, jump and squat motions and
dance moves.
Inexperienced users had a harder time to comprehend spatial
relationships, while those with more experience in 3D animation
notably picked up controls more uently. This comes as no surprise, as using and controlling software takes time and practice,
regardless of interface. For novice and casual users, our 2DOF strategy seems appropriate, since it constrains manipulation by the
depth dimension. However, the interface might need improvement
visualizing these constraints and giving more hints on depth cues.
282
[49]
[50]
[51]
[52]
[53]
[54]
[55]
[56]
[57]
[58]
[59]
283