Documente Academic
Documente Profesional
Documente Cultură
ABSTRACT
Motion compensated interpolation (MCI) refers to the process of
Keywords
Frame interpolation, slow motion, high definition, video,
taking a video sequence, finding motion information, and then
postproduction, ambient, video art.
using that information to produce interpolated video frames
between source frames. In this study, we compare two MCI
techniques: adjacent-frame motion compensated interpolation 1. INTRODUCTION
(AF-MCI) and wide-span motion compensated interpolation (WS- Motion Estimation (ME) and Motion Interpolation (MI) are two
MCI). Using reproducible artificially generated video sequences, ordered constituent phases of Motion Compensated Interpolation
the methods are quantitatively compared with the objective of (MCI). They are fundamental problems in digital video
optimizing interpolated frame quality relative to control processing and are the subject of much research [17].
interpolated frames. This is useful because on a large flat-panel Furthermore, image-plane motion estimation is an integral part of
display with high resolution (such as HDTV), frame transition motion compensated filtering and compression [18]. The
coherence becomes a crucial factor in assessing the quality of the predicted need for increased capabilities for slow motion in the
user’s viewing experience. To enhance MCI, the encoder should postproduction phase of creative video work in the realm of high
attempt to exploit long-term statistical dependencies, precisely definition will also involve the use of related techniques [20].
estimate motion by modeling the motion vector field, and
superimpose efficient prediction/interpolation algorithms. WS- Typical ME tracks the movement of objects, or parts of objects,
MCI achieves this. Computer simulations using artificially from one frame to the next. Chupeau & Salmon [8] used a motion
generated video sequences demonstrate the consistent advantage compensation algorithm where all pixels in the interpolated
of WS-MCI over adjacent-frame MCI under increasingly complex frames are assumed to have at least one motion vector from sets
source scenes and chaotic occlusion conditions. of adjacent frames {-1Æ0}, {0Æ1}, and {1Æ2}. This is known
as AF-MCI. In contrast, our novel method always tracks motion
across the crucial central time frame gap {0Æ1} into which
interpolated frames will be placed. We call this the “sweet spot”.
Categories and Subject Descriptors We know from Physics that an object in motion tends to stay in
H.5.1 [Information Interfaces and Representation]: motion. This means that linear motion is more likely than
Multimedia Information Systems – video; C.3 [Special-Purpose oscillatory motion, and vectors that span a few frames may be
and Application-Based Systems]: Signal Processing Systems; better able to depict object motion as compared to motion vectors
J.5 [Arts and Humanities]: Fine Arts that span only one frame. This is especially true since sampling
of our analog world into the digital pixels of successive video
General Terms frames means that each frame cannot capture the scene
Algorithms, Performance, Design, Experimentation, Theory. completely due to quantization and rasterization sampling errors
(e.g. precise color at object edges). Moreover, when performing
ME, tracking a sum-of-luminance (SOL) value from an adjacent
frame to another may lead to erroneous and erratic motion vectors
due to the fact that there is not a 1:1 ratio of similar SOL values in
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies
one frame to another. In other words, due to quantization and
are not made or distributed for profit or commercial advantage and that rasterization errors during filming, there is no guarantee that each
copies bear this notice and the full citation on the first page. To copy pixel will map into any other frame. Aside from sampling errors
otherwise, or republish, to post on servers or to redistribute to lists, tainting motion information, the AF-MCI method for ME is
requires prior specific permission and/or a fee. fallible because it assumes that each pixel has a good chance of
ACE 06, June 14-16, 2006, Hollywood, California, USA. being present in each adjacent frame. This assumption is perhaps
Copyright 2006 ACM 1-59593-380-8 /06/0006 ...$5.00. unfounded because different parts of an object can be obscured,
transformed, blurred, or occluded by another object; and producing slow motion while shooting was a reasonably
therefore, each pixel will not necessarily appear in each adjacent accessible technique utilized by a wide variety of theatrical,
frame. documentary and experimental filmmakers. Nonetheless, most
video cameras can not produce slow motion by over-cranking.
A small amount of motion can result in large differences in the
Further, converting footage to slow motion in video
pixel values of a scene, especially near the edges of an object.
postproduction is often subject to the same problem as film when
Also, a wider field of view and higher resolution (HDTV) makes
the action is slowed by greater than a factor of two - the illusion
it harder to fool the eye, and therefore more difficult to do motion
of smooth motion is lost.
compensation [14]. The effectiveness of MCI will depend on the
ME and MI phases as well as on source video scene complexity. Slow motion can be used for a variety of effects, and so its
MCI is useful in various circumstances. Current NTSC frame relative difficulty in video production is problematic. Bordwell &
rates may be increased to accommodate newer HDTV or Thompson point out that slow motion has been used for such
multimedia standards. MCI can be used to create new frames for diverse ends as super-realism, fantasy, lyricism, or the visceral
frame rate up-sampling and slow motion video [5]. Although treatment of cinematic violence [22]. Zettl has a similar list,
pixel-wise MCI was abandoned more than a decade ago [7], maintaining that the technique can allow us to see phenomena
recent hardware advances make it an emerging viable alternative more clearly as well as imply a freedom from gravity, the agony
to conventional region-based techniques. of not reaching a goal, or even an increased awareness of subject
speed [23]. The authors of this paper have maintained that slow
We propose to address some of these problems by using temporal
motion will be one of the stylistic foundations of a new "ambient
motion compensation that analyzes a “Wide-Span” of multiple
video" form supported by the imminent dissemination of large
frames over time (WS-MCI). Performing ME on SOL values
and high-definition flat-panel home television display units. This
across the sweet spot frame gap for all motion vectors allows us to
new form builds on a long artistic history of the "slow-form"
focus exploitation of available motion information in the time
exploration of the moving image. Film and video artists as
span that is most crucial. Additionally, WS-MCI can determine
diverse as Michael Snow (Wavelength), Andy Warhol (Empire
how an SOL value has moved over a wider span of time.
and many others), Yoko Ono (Sky), Michael Fricke (Baraka), and
Thereby, it can better ignore movement ‘noise’ between adjacent
Bill Viola (The Greeting and many others) have explored the
frames because it does not assume that each pixel is present in
power of the slowly changing moving image. Growing
every frame. This heuristic reduces quantization and rasterization
contemporary interest in the ambient video form is confirmed by
error creeping into the motion vectors.
the latest SIGGRAPH Call for Works which includes the
The rest of the paper is organized as follows: Section 2 highlights category: "4D Wall Hung Work". This category is described as:
the importance of slow motion in video aesthetics. Section 3 "Works that reside in a frame or on the wall but move”. For
gives an overview of the MCI process. Section 4 briefly instance, a plasma screen with a slowly evolving image, or a
describes related techniques. The methods used in this study are projection onto a frame on the wall [24].
covered in Section 5. The next section presents our results and a Despite the natural attraction of the well-composed slowly-
discussion of them. Before concluding the paper, we summarize moving image, it is not a simple goal to achieve. The current
our future work. generation of video artists and producers have accessible and
relatively inexpensive access to a variety of sophisticated and
2. SLOW MOTION AND VIDEO high quality production and postproduction tools, but slow motion
AESTHETICS remains a difficult challenge to easily produce.
We have argued in [20] that the introduction of video
technologies such as large-scale flat-screen displays and high- 3. MCI OVERVIEW
definition video standards will transform our traditional A movie is composed of discrete video frames where each frame
conception of televisual aesthetics. We maintain, among other is a 2D image. Motion estimation uses the source frames to
effects, that the television image will become more pictorial, deduce the motion vectors, and interpolation uses that information
approaching the cinematic in ways that video has never been able to produce the new interpolated frames (Figure 1). MCI can be
to match. This trend will see a new televisual aesthetic that iterated for each time gap in the source video sequence to produce
emphasizes cinematic (and photographic) visual staples such as a video sequence with multiple new frames (Figure 2). This
the sweeping wide-shot, composition in depth, the play of light results in slow motion if the movie’s frame rate is preserved when
and shadow, revelation of patterns and textures, and the viewed. Multiple time gaps, into which interpolated frames are
manipulation of the time base of the moving image [25]. placed, represent multiple iterations of the MCI process. Also,
the identities of the frame indices will shift forward in time for
Changing the time base to depict slow motion is the most difficult
each MCI iteration such that the sweet spot spans different pairs
technical feat in this array of cinematic visual devices that will be
of successive frames. In Figure 2, only one new frame is inserted
adopted in the emergence of a new video aesthetic. Slow motion
into the video sequence per source frame time gap; however, once
was possible to realize in cinema, relying on either over-cranking
motion information is gleaned from the source video, any number
the camera (running it faster than the 24 frames per second
of interpolated frames can be produced between source frames
projection speed) or on multiple-printing in postproduction. The
(Figure 3).
latter was of limited utility, typically losing visual quality and
impact after the postproduction film speed was slowed by
anything larger than a factor of two. However, it was possible to
over-crank the camera with a variety of brands and models, and
The interpolation step uses motion information to deduce where
MCI process
pixels (SOL value positions) will migrate over time across the
source sweet spot. Figure 4 further illustrates that, for each MCI
ME video
sequence iteration, any number of interpolated frames can be produced
within each sweet spot because the motion vectors pass through
an infinite number of possible new frames within the time span.
motion information
(motion vectors)
e e
ti m t im
b)
MCI {0} {1}
source
frame
index interpolation “sweet spot”
3
Table 1. Quantitative comparison of AF-MCI and WS-MCI
2
over a progression of source video scene complexity.
1
scene # Pearson 2 tailed av. ECCE WS-MCI
correlatio 0
complexity runs n P-value AF-MCI WS-MCI % benefit 0 2 4 6
AF-MCI vs . Control (ECCE)
9.17E-
1 circle 40 0.9850 12 1.86 1.58 15.20% b) 7 random objects (inkblots / circles)
6.63E- 40
7 circles 40 0.9812 15 10.08 9.27 8.09% y = 0.98x - 0.53
WS-MCI vs. Control
3.65E-
(ECCE)
3.11E-
14 circles 40 0.9768 08 19.47 18.78 3.54%
0
5.07E- 0 10 20 30 40
14 objects 80 0.9949 11 26.24 25.34 3.43%
AF-MCI vs . Control (ECCE)
14 2.57E-
inkblots 40 0.9958 02 34.40 33.51 2.61%
21 3.00E-
inkblots 40 0.9865 03 44.93 44.46 1.05%
how WS-MCI might be adapted for video compression for use
c) 7 random inkblots
with HDTV signals, and this will involve testing using natural
50 HDTV video sequences.
WS-MCI vs. Control
y = 0.97x - 0.55
Objects in our artificial scenes are moving at constant linear
(ECCE)
same time span. Using backward vectors may cause more pixels
to try to find a position on the interpolated frames. In addition,
(ECCE)
8. CONCLUSION
0
Since the goal of motion estimation and motion interpolation is to
0 25 50
produce convincing inter-frames, it is important that we focus the
search for quality motion information at the sweet spot. This idea
AF-MCI vs. Control (ECCE)
manifests itself in the wide-span motion compensated
interpolation algorithm that we have presented in this paper.
Figure 11. Comparison of WS-MCI and AF-MCI methods for Using artificial video sequences, we created a test apparatus
interpolated frame quality. Figures a), b), and c) each capable of reproducing source video frames of varying
represent data from 40 pairs of experimental runs (AF-MCI complexity and used it to compare two MCI techniques: AF-MCI
and WS-MCI), and d) from 80 pairs. Each figure represents a and WS-MCI. The resulting measure of average experimental-
different level of scene complexity. Each axis has units of the control-color-errors and paired t-tests show that WS-MCI
total color difference for all pixels in all frames between consistently outperforms AF-MCI for complex video sequences.
experimental and control interpolated frames (experimental- These results clearly indicate that using WS-MCI on a real-world
control-color-error, ECCE). chaotic scene would be the better choice. Our overall objective is
the application of these advanced computational techniques for
Table 1 shows that scenes with several simple objects, random the creation of enhanced visual and aesthetic effects such as slow
objects, or complex objects are all handled better by WS-MCI motion in the postproduction of high-resolution video on large-
because P-values remain lower than 0.05. Therefore, we can screen displays.
conclude that WS-MCI treatment is statistically significant and
effective. Additionally, the algorithm has the same resource
expense as AF-MCI. For this reason, WS-MCI should be
9. ACKNOWLEDGMENTS
This work is funded by the Social Sciences and Humanities
considered as part of any new MC algorithm to produce higher
Research Council (SSHRC) of Canada through a Research
quality interpolated frames, or map motion for video compression.
Development Initiative (RDI) grant under project number: 31-
The average ECCE suggests how far away from ideal the results 639484.
were, and is a metric for MCI analysis. Since 14 inkblots/circles
shows approximately the same EC-color-error as 7 inkblots, this
10. REFERENCES
supports the intuitive idea that circles are easier to analyze than
[1] M. Al-Mualla, “Motion Field Interpolation for Frame Rate
inkblots. These results suggest that 14 complicated objects
Conversion,” ISCAS ’03, vol.2, pp.652-5, 2003.
drastically increase the analysis difficulty over 14 random objects.
Not surprisingly, EC-color-error assay results with 14 and 21 [2] P. Anandan, “A Computational Framework and an
inkblots produced more EC-color-error than 14 inkblots/circles. Algorithm for the Measurement of Visual Motion,”
International Journal of Computer Vision, vol.2, no.3,
pp.283-310, 1989.
7. FUTURE WORK
Pixel-wise ME that uses SOL search does not concern itself with [3] T. Andre, M. Cagnazzo, M. Antonini, and M. Barlaud,
color. In real-world scenes, this will be problematic – especially “(N,0) Motion-compensated Lifting-based Wavelet
when an object enters/exits shadows. Future research will focus Transform,” IEEE International Acoustics, Speech, and
on making wide-span MCI more robust in terms of object tracking Signal Processing, vol.3, pp.121-4, 2004.
with regards to objects’ colors. Other future studies will consider
[4] P. Burt, “Multiresolution Techniques for Image [16] C. Stiller and J. Konrad, “Estimating Motion in Image
Representation, Analysis, and 'Smart' Transmission,” Sequences,” IEEE Signal Processing, vol.76, no.4, pp.70-91,
Proceedings of SPIE Conference, vol.1199, pp.2-15, 1989. 1999.
[5] B. Choi, S. Lee, and S. Ko, “New Frame Rate Up-conversion [17] M. Tekalp, Digital Video Processing, Prentice Hall Inc.,
using Bi-directional Motion Estimation,” IEEE Transactions New Jersey, USA, 1995.
on Consumer Electronics, vol.46, no.3, pp.603-9, 2000. [18] Y. Wang, J. Ostermann, and Y-Q. Zhang, Video Processing
[6] B. Chupeau, “Multiscale Motion Estimation,” Signal and Communications, Prentice Hall Inc., New Jersey, USA,
processing of HDTV, vol.3, pp.233-240, 1992. 2002.
[7] B. Chupeau and E. Francois, “Region-based Motion [19] W. Zheng, Y. Kanatsugu, S. Itoh, and Y. Tanaka, “Analysis
Estimation for Content-based Video Coding and Indexing,” of Space-dependent Characteristics of Motion-compensated
SPIE Visual Communications and Image Processing, Frame Differences,” International Conference on Image
vol.4067, pp.884-893, 2000. Processing, vol.3, pp.158-164, 2000.
[8] B. Chupeau and P. Salmon, “Motion Compensating [20] B. Ben Youssef, J. Bizzocchi, and J. Bowes, “The Future of
Interpolation for Improved Slow Motion,” Signal Processing Video: User Experience in a Large-Scale, High-Definition
of HDTV, vol.4, pp.717-724, 1993. Display Environment”, in Proceedings of the ACM SIGCHI
[9] M. Flierl, T. Wiegand, and B. Girod, “Rate-Constrained International Conference on Advances in Computer
Multihypothesis Prediction for Motion-Compensated Video Entertainment Technology (ACE ’05), pp. 204-208,
Compression,” IEEE Transactions on Circuits and Systems Valencia, Spain, June 15-17, 2005.
for Video Technology, vol.12, no.11, pp.957-969, 2002. [21] B. Ben Youssef, J. Bizzocchi, and J. Bowes, “High-
[10] B. Girod and S. Han, “Optimum Update for Motion- Definition Video Processing in Postproduction:
Compensated Lifting,” IEEE Signal Processing Letters, Opportunities and Challenges”, in Proceedings of the 9th
vol.12, no.2, pp.150-3, 2005. World Multi-Conference on Systemics, Cybernetics and
Informatics (WMSCI 2005), vol. V, pp. 222-227, July 10-13,
[11] Y. Jung and Y. Ho, “Multiple Object Tracking under 2005, Orlando, Florida.
Occlusion Conditions,” Visual Communications and Image
Processing, Proceedings of SPIE Conference, vol.4067, [22] D. Bordwell and K. Thompson, Film Art: An Introduction,
pp.1011-1021, 2000. McGraw-Hill, Boston, 2003.
[12] S. Lee, S. Yang, Y. Jung, and R. Park, “Adaptive Motion- [23] H. Zettl, Sight, Sound, Motion: Applied Media Aesthetics,
compensated Interpolation for Frame Rate Up-conversion,” Wadsworth, Belmont, CA, 2005.
IEEE Transactions on Consumer Electronics, vol.48, no.3, [24] Call for Proposals: SIGGRAPH 2006,
pp.444-50, 2002. <http://www.siggraph.org/s2006/main.php?f=cfp&p=art&s=
[13] P. Robert, “Motion Compensating Interpolation Considering categories#4d>, viewed on March 3, 2006.
Occluding, Appearing and Disappearing Areas,” Signal [25] J. Bizzocchi, “Ambient Video: The Transformation of the
Processing of HDTV, vol.3, pp.329-341, 1992. Domestic Cinematic Experience”, in Emerging Small
[14] S. Solari, Digital Video and Audio Compression, McGraw- Technologies, University of Minnesota Press, Publication
Hill, New York, USA, 1997. Forthcoming.