Scalable Coding of Meshes

FACULTY OF ENGINEERING
Department of Electronics and Informatics

Scalable Error-resilient
Coding of Meshes

Thesis submitted in fulfillment of the requirements
for the award of the degree of Doctor in Engineering
(Doctor in de Ingenieurswetenschappen) by

ir. Dan C. Cernea

September 2009

Advisors: Prof. Adrian Munteanu
Prof. Peter Schelkens

Print: DCL Print & Sign, Zelzate

2009 Dan Costin Cernea

2009 Uitgeverij VUBPRESS Brussels University Press
VUBPRESS is an imprint of ASP nv
(Academic and Scientific Publishers nv)
Ravensteingalerij 28
B-1000 Brussels
Tel. ++32 (0)2 289 26 50
Fax ++32 (0)2 289 26 59
E-mail: info@vubpress.be
www.vubpress.be

ISBN 978 90 5487 676 2
Legal Deposit D/2009/11.161/131

All rights reserved. No parts of this book may be reproduced or
transmitted in any form or by any means, electronic, mechanical,
photocopying, recording, or otherwise, without the prior written
permission of the editor.

Examining Committee
Prof. Adrian Munteanu Vrije Universiteit Brussel Promoter
Prof. Peter Schelkens Vrije Universiteit Brussel Promoter
Prof. Hugo Thienpont Vrije Universiteit Brussel Committee chair
Prof. Rik Pintelon Vrije Universiteit Brussel Committee vice-chair
Prof. Joeri Barbarien Vrije Universiteit Brussel Committee secretary
Prof. Francisco Morn Burgos Universidad Politcnica de Madrid Member
Dr. Alin Alecu Oracle Corporation Member
Prof. Theo D'Hondt Vrije Universiteit Brussel Member
Prof. Jan Cornelis Vrije Universiteit Brussel Member

To my family

I
TABLE OF CONTENTS
ACKNOWLEDGMENTS III
ABSTRACT V
CHAPTER 1 INTRODUCTION 1
1.1 MOTIVATION................................................................................................1
1.1.1 Compression and Scalability ...............................................................1
1.1.2 Distortion Metrics ................................................................................3
1.1.3 Error-resilience ...................................................................................4
1.2 OUTLINE.......................................................................................................4
CHAPTER 2 MESHGRID OVERVIEW 7
2.1 INTRODUCTION ............................................................................................7
2.2 MESHGRID REPRESENTATION.....................................................................9
2.2.1 3D Wavelet Decomposition and RG Coding Algorithm ..................... 13
2.2.2 Compression Performance ................................................................. 15
2.3 MESHGRID FEATURES ............................................................................... 16
2.3.1 Scalability .......................................................................................... 16
2.3.2 Animation and Morphing ................................................................... 20
2.3.3 Streaming ........................................................................................... 22
2.4 CONCLUSIONS ............................................................................................ 23
CHAPTER 3 WAVELET-BASED L-INFINITE CODING OF MESHES 25
3.1 INTRODUCTION .......................................................................................... 25
3.2 DISTORTION METRICS............................................................................... 27
3.2.1 L-1 and L-2 Distortion Metrics .......................................................... 27
3.2.2 L-infinite Distortion Metric ................................................................ 28
3.3 NEAR-LOSSLESS L-INFINITE-ORIENTED DATA COMPRESSION ................ 29
3.4 THE SMALLEST UPPER BOUND OF THE L-INFINITE DISTORTION IN
LIFTING BASED WAVELET TRANSFORMS ................................................. 36
3.4.1 The Lifting-based Wavelet Transform ................................................ 36
3.4.2 The Maximum Absolute Difference (MAXAD) ................................... 40
3.4.3 MAXAD Examples ............................................................................. 43
3.5 SCALABLE L-INFINITE CODING OF MESHES ............................................. 51
3.5.1 Scalable Mesh Coding Techniques .................................................... 51
3.5.2 Distortion Formulation ...................................................................... 54
3.5.3 Scalable L-infinite Coding Systems .................................................... 57
3.5.4 L-infinite Distortion Estimators ......................................................... 60

II
3.5.5 Rate-Distortion Optimization Algorithm ........................................... 65
3.6 RELATION BETWEEN MAXAD AND THE HAUSDORFF DISTANCE .......... 67
3.7 MESHGRID INSTANTIATION ..................................................................... 69
3.8 EXPERIMENTAL RESULTS ......................................................................... 70
3.8.1 Error Distribution ............................................................................. 70
3.8.2 L-infinite Scalability .......................................................................... 73
3.8.3 Distortion Metrics Comparison: L-2 vs. L-infinite ........................... 76
3.8.4 Distortion Metrics Comparison: Theoretical vs. Statistical L-infinite85
3.9 CONCLUSIONS ........................................................................................... 88
CHAPTER 4 SCALABLE ERROR-RESILIENT CODING OF MESHES 91
4.1 INTRODUCTION ......................................................................................... 91
4.2 ERROR-RESILIENT MESH CODING TECHNIQUES .................................... 92
4.2.1 Mesh Partitioning Techniques .......................................................... 92
4.2.2 Progressive Mesh Coding Techniques .............................................. 93
4.3 SCALABLE JOINT SOURCE AND CHANNEL CODING OF MESHES ............. 97
4.3.1 JSCC Formulations ........................................................................... 99
4.3.2 Optimized Rate-Allocation .............................................................. 101
4.3.3 Low-Density Parity-Check Codes ................................................... 103
4.4 EXPERIMENTAL RESULTS ....................................................................... 104
4.4.1 UEP Performance Overview ........................................................... 105
4.4.2 UEP vs. Equal Error Protection ..................................................... 107
4.4.3 UEP vs. State of the Art .................................................................. 110
4.4.4 Graceful Degradation ..................................................................... 112
4.5 DEMONSTRATION OF SCALABLE CODING AND TRANSMISSION FOR
MESHGRID .............................................................................................. 120
4.6 CONCLUSIONS ......................................................................................... 122
CHAPTER 5 CODING OF DYNAMIC MESHES BASED ON MESHGRID 123
5.1 INTRODUCTION ....................................................................................... 123
5.2 DYNAMIC-MESH CODING APPROACH ................................................... 123
5.3 EXPERIMENTAL RESULTS ....................................................................... 126
5.4 CONCLUSIONS ......................................................................................... 134
CHAPTER 6 CONCLUSIONS AND PROSPECTIVE WORK 135
6.1 CONCLUSIONS ......................................................................................... 135
6.2 PROSPECTIVE WORK .............................................................................. 137
LIST OF PUBLICATIONS 139
REFERENCES 141
ACRONYMS 149

III
ACKNOWLEDGMENTS

These few more paragraphs, and my thesis is complete. It is a moment of great
joy for me, and not only because writing it was a tedious work, but especially
because this thesis symbolizes the end of a long journey, started years ago. I had to
face many challenges along the way, and, without the guidance and support coming
from many directions, probably I would have not arrived to the end of it. Therefore,
this is the moment when I look back and try to express in words my gratitude and
appreciation to everyone helping me along the way.
I will start by thanking Prof. Jan Cornelis and Prof. Peter Schelkens for giving me
the opportunity of starting this PhD in the first place, and for their continuous effort
in creating an increasingly stimulating work environment where nothing is
impossible.
I want to express my most sincere gratitude to my promoters Prof. Adrian
Munteanu and Prof. Peter Schelkens for their constant support, and for following
and guiding my work during all these years. Furthermore, Prof. Adrian Munteanu,
not only that he has been there all this time as an advisor and mentor, but he also
allocated extra time and effort for meticulously assisting me in my research, and
thoroughly revising my publications and this document. His insights were essential
to climb some steep hills experienced during this PhD journey, while his comments
and suggestions have significantly contributed to the correctness and clarity of this
text.
I specially thank Dr. Alexandru I. Salomie, who guided my first steps in this
unforgettable PhD challenge. He has patiently introduced me to this unexplored
world and played a very important role in many aspects of this thesis.
I wish to thank also Dr. Alin Alecu for his invaluable support, beginning with my
first attempts in scientific writing and continuing with many important aspects of my
research.
I would also like to thank Prof. Hugo Thienpont, Prof. Rik Pintelon, Prof. Joeri
Barbarien, Prof. Francisco Morn Burgos, Dr. Alin Alecu, Prof. Theo D'Hondt and
Prof. Jan Cornelis for accepting to be the members of my PhD jury.
Then, I would like to thank my colleagues and friends at the ETRO department,
who have contributed in making the daily environment an enjoyable and stimulating
place: Silviu, Mihai, Augustin, Oana, Aneta, Nikos, Andreea, Salua, Steven, Jan,

IV
Frederik V. (aka Freddy), Tim B., Freya, Shahid, Leon, Yiannis, Dirk, Joeri, Dieter,
Tom, Tim D., Ann, Bart, Guan, Basel, Maxine, Fabio.
Finally yet importantly, I thank my family for their love, patience, permanent
support and endless encouragements during all these years, and for making the home
distance to seem more bearable.

Dan C. Cernea
Brussels, September 24, 2009

V
ABSTRACT
The dissertation mainly focuses on two topics in the field of scalable coding of meshes.
The first topic introduces the novel concept of local error control in mesh geometry encoding.
In contrast to traditional mesh coding systems that use the mean-square error as target
distortion metric, this dissertation proposes a new L-infinite mesh coding approach, for which
the target distortion metric is the L-infinite distortion. In this context, a novel wavelet-based
L-infinite-constrained coding approach for meshes is proposed, which ensures that the
maximum error between the original and decoded meshes is lower than a given upper bound.
Furthermore, the proposed system achieves scalability in L-infinite sense, that is, any
decoding of the input stream will correspond to a perfectly predictable L-infinite distortion
upper bound. Two distortion estimation approaches are presented, expressing the L-infinite
distortion in the spatial domain as a function of quantization errors produced in the wavelet
domain. Additionally, a fast algorithm for solving the rate-distortion optimization problem is
conceived, enabling a real-time implementation of the rate-allocation. An L-infinite codec
instantiation is proposed for MESHGRID, which is a scalable 3D object encoding system, part
of MPEG-4 AFX. The advantages of scalable L-infinite coding over L-2-oriented coding are
experimentally demonstrated. One concludes that the proposed L-infinite coding approach
guarantees an upper-bound on the local error in the decoded mesh, it enables a fast real-time
implementation of the rate-allocation, and it preserves all the scalability features and
animation capabilities of the employed scalable mesh codec.
The second topic presents a new approach for Joint Source and Channel Coding (JSCC) of
meshes, simultaneously providing scalability and optimized resilience against transmission
errors. An unequal error protection approach is followed, to cope with the different error-
sensitivity levels characterizing the various resolution and quality layers produced by the
input scalable source codec. The number of layers and the protection levels to be employed
for each layer are determined by solving a joint source and channel coding problem. In this
context, a novel fast algorithm for solving the optimization problem is conceived, enabling a
real-time implementation of the JSCC rate-allocation. A JSCC instantiation based on
MESHGRID is proposed. Numerical results show the superiority of the L-infinite norm over the
classical L-2 norm in a JSCC setting. One concludes that the proposed JSCC approach offers
resilience against transmission errors, provides graceful degradation, enables a fast real-time
implementation, and preserves all the scalability features and animation capabilities of the
employed scalable mesh codec.

Chapter 1
INTRODUCTION

1.1 MOTIVATION
1.1.1 Compression and Scalability
Nowadays, an increasing number of applications in various domains such as
entertainment, design, architecture and medicine make use of 3D computer graphics.
Additionally, the increasing demand in mobility has led to an abundance of
terminals, varying from low-power mobile devices to high-end portable computers.
Furthermore, the 3D models are obtained from various sources such as modeling
software and 3D scanning. To achieve a high level of realism, complex models are
required, which usually demand a huge amount of storage space and/or transmission
bandwidth in the raw data format. As the number and the complexity of existing 3D
meshes increase explosively, higher resource demands are placed on storage space,
computing power, and network bandwidth. Among these resources, the network
bandwidth is the most severe bottleneck in network-based graphic applications that
demand real-time interactivity. In this case, even more important than compact
storage, is the possibility to scale the complexity of the surface representations
according to the capacity of the digital transmission channels or to the performance
of the graphics hardware on the target platform. Thus, it is essential to represent the
graphics data efficiently, in a compact and, in the same time, scalable manner. This
research area has received a lot of attention since the early 1990s, and there has been
a significant amount of progress along this direction over the last decade.
Early research on 3D mesh compression concentrated on single-rate compression
techniques to save storage space or bandwidth between CPU and the graphics card.
In a single-rate 3D mesh compression algorithm, the data is analyzed and processed
as a whole; in other words, the original mesh can be reconstructed only if the
encoded bit stream is entirely available. While this is acceptable in local usage
scenarios, it is difficult or even unfeasible in distributed environments like the
Internet. Therefore, the focus in the research community has shifted towards
2 Chapter 1

progressive compression and transmission of meshes. When progressively
compressed, a 3D mesh can be reconstructed increasingly from coarse to fine levels
of detail (LODs) while the bit stream is being received. Examples of various
scalability modes are given next: Figure 1-1 demonstrates the scalability in mesh
resolution, by which the number of vertices at each resolution level is progressively
increased, while Figure 1-2 illustrates scalability in quality, by which the accuracy
of the position of each vertex is progressively increased. These examples show that
progressive compression can enhance the user experience, since a low resolution
reconstruction can be available early on, and the transmission can be interrupted
whenever desired or necessary.

Figure 1-1: The Bunny model reconstructed at different resolution levels, from a
low resolution (left) to a high resolution (right).

Figure 1-2: The Venus model reconstructed at a constant resolution level, but at
different levels of quality, from coarse (left) to fine (right).
A solution for representing and transmitting 3D graphics on a wide range of
terminals with various characteristics in terms of resolution, quality and bandwidth
has been offered recently by MESHGRID. MESHGRID is a scalable mesh coding
technique, providing a quality-and-resolution scalable representation of the 3D
Introduction 3

object, as well as region-of-interest coding and client-view adaptation. These
characteristics, along with other advantages, made it the mesh representation format
of choice for our experiments. Therefore, Chapter 2 gives a short overview of the
MESHGRID representation, presenting in more details this mesh compression
technology and its features.
1.1.2 Distortion Metrics
Regarding the quality of the reconstruction, the 3D graphics compression
algorithms can be divided in roughly two categories, depending on whether they
provide lossless or lossy reconstruction. While there are some applications where a
lossless representation is compulsory, like for instance medicine, in most of the
cases a certain error is acceptable for the reconstructed 3D data, while allowing for
much higher compression ratios. Therefore, lossy or near-lossless compressions are
suitable for a broad range of applications, but an appropriate distortion measure
needs to be employed in order to accurately quantify and control the distortion
incurred by the compression system.
Little attention has been given to the area of distortion measurement in the case of
3D graphics lossy compression. The distortion measure commonly used in image
and video coding, i.e. the Mean Squared Error (MSE), has been generally employed
for 3D data as well. The MSE is an average distortion measure, giving a good
approximation of the global error and an expression of the overall perceptual
quality. One of its major drawbacks consists in the fact that it does not exploit
local knowledge about the signal of interest. The local error behavior is lost, due
to an averaging of the reconstruction error throughout the entire data. However,
there are applications that require imposing a tight bound on the individual elements
of the error signal, i.e. constraining the elements of the reconstruction error signal to
be bounded by some given thresholds. This is especially desired in the case of a
mesh representation, where a large error for a single vertex can translate to
considerable visual distortions. Therefore, a new distortion measure is needed to
address these issues in the case of 3D graphics.
As an answer, the L-infinite norm criterion has been proposed as a candidate for a
perceptually meaningful norm, in that the distortion provides a good approximation
of the maximum local error. In Chapter 3, we propose the novel concept of local
error control in lossy coding of meshes. With this respect, a scalable L-infinite mesh
coding approach is proposed, simultaneously performing local error control and
providing scalability in L-infinite sense.
4 Chapter 1

1.1.3 Error-resilience
State of the art 3D graphics compression schemes provide bandwidth adaptation
and offer a broad range of functionalities, including quality and resolution
scalability, and view-dependent decoding. In the context of network transmissions
however, they do not address important network factors such as packet losses.
Because of the sensitivity and interdependence of the bitstream layers generated by
these coding techniques, when a packet is lost or corrupted due to transmission
errors, all the following packets must be discarded. Therefore, without appropriate
measures, scalable mesh coding techniques produce bitstreams that are very
sensitive to transmission errors, i.e. even a single bit-error may propagate and cause
the decoder to lose synchronization and eventually collapse. As a result, the decoded
3D model can suffer extreme distortions or even complete reconstruction failure.
Appropriate error protection mechanisms are therefore of vital importance in
transmission over error-prone channels, in order to protect the bitstream against
severe degradations or to reduce the end-to-end delay. This issue is tackled in
Chapter 4, which proposes a novel Joint Source and Channel Coding (JSCC)
technique for meshes, providing optimized resilience against transmission losses and
maintaining the scalability features of the employed scalable source coder.

1.2 OUTLINE
An overview of the structure of this document is provided in this section, and the
major contributions of our work are highlighted.
Chapter 2 overviews the most important scalable mesh coding techniques in the
literature and motivates the choice of MESHGRID, which is the scalable mesh
compression technology used further in our developments. A short overview of the
MESHGRID codec follows, and its main features are emphasized.
The main contributions of this dissertation are presented in Chapter 3 and Chapter
4. In Chapter 3, a novel concept of local error control in mesh geometry encoding is
proposed, for which the target distortion metric is the L-infinite distortion. Thus, a
novel wavelet-based L-infinite-constrained coding approach for meshes is presented,
which ensures that the maximum error between the original and decoded meshes is
lower than a given upper bound. Next, the proposed system is shown to achieve
scalability in L-infinite sense, that is, any decoding of the input stream will
correspond to a perfectly predictable L-infinite distortion upper bound. Two
distortion estimation approaches are presented, expressing the L-infinite distortion
in the spatial domain as a function of quantization errors produced in the wavelet
Introduction 5

domain. Additionally, a fast algorithm for solving the rate-distortion optimization
problem is developed, enabling a real-time implementation of the rate-allocation.
Further, error-resilient techniques for meshes are investigated, and a novel joint
source and channel coding approach is proposed in Chapter 4. The proposed
approach provides resilience against transmission errors and, at the same time,
preserves the scalable properties of the bitstream. An unequal error protection
approach is followed, to cope with the different error-sensitivity levels
characterizing the various resolution and quality layers produced by the input
scalable source codec. In this context, a fast algorithm for optimizing the rate
allocation between the source and the channel is presented, which allows for a real-
time implementation of the proposed error protection technique.
Chapter 5 explores the benefits of employing the proposed L-infinite distortion
metric for coding of dynamic mesh sequences. Hence, the concept of L-infinite
mesh coding is extrapolated from static models to dynamic models, and the coding
performance of MESHGRID is evaluated, when used to encode a time-varying
sequence of a 3D mesh.
In the end, Chapter 6 draws the conclusions of this work and sketches the
prospective work related to this dissertation.

Chapter 2
MESHGRID OVERVIEW
Equation Chapter 2 Section 1
2.1 INTRODUCTION
While more and more applications make use of 3D computer graphics, the most
popular representation for 3D objects still today is the uncompressed
IndexedFaceSet model, dating from the early days of computer graphics. Yet, this
simple and straightforward representation, has not been designed to deal efficiently
with highly detailed and complex surfaces, consisting of ten to hundreds of
thousands of triangles, necessary to achieve realistic rendering of daily life objects,
measured for instance with laser range scanners or structured light scanners. Even
more important than compact storage, is the possibility to scale the complexity of
the surface representations according to the capacity of the digital transmission
channels or to the performance of the graphics hardware on the target platform.
Another vital issue for the animation of objects is the support for free form modeling
or animation, offered by the representation method. In this context, MPEG-4
Animation Framework eXtension (AFX) [ISO/IEC 2004] has recently standardized
a set of several techniques for compact and scalable arbitrary-mesh encoding. The
MPEG-4 AFX techniques include 3D Mesh Coding (3DMC) [Taubin 1998b],
Wavelet Subdivision Surfaces (WSS) [Lounsbery 1997], and our recently proposed
MESHGRID surface representation method [Salomie 2005, Salomie 2004b].
A first category of techniques tries to respect as much as possible the vertex
positions and their connectivity as defined in the initial IndexedFaceSet description,
while the second category opts for re-meshing the original input, by defining a new
set of vertices with specific connectivity properties. This second category of
techniques allows for achieving higher compression ratios and other features, such
as scalability and support for animation. The second approach is certainly more
complex at the encoding stage, since a surface obtained via re-meshing will have to
be fitted within a certain error to the initial mesh description.
For the first category of techniques, the basic approach is to efficiently encode the
8 Chapter 2

connectivity graph, describing for each polygon in the mesh its vertices and their
order; see for example [Rossignac 1999]. In [Taubin 1998b], the Topological
Surgery scheme was proposed to compress the connectivity of manifold polygonal
meshes of arbitrary topological type as well as the vertex locations. A face forest,
spanning the dual graph of the mesh, connects the faces of the mesh, and a vertex
graph connects all the vertices. The coordinates of a vertex are predicted using a
linear combination of the ancestor vertices preceding it in the vertex graph traversal.
The Progressive Forest Split (PFS) scheme described in [Taubin 1998a] provides
scalability by combining a low resolution mesh with a sequence of forest split
refinement operations, balancing compression efficiency with granularity.
Within the second category of techniques, WSS [Lounsbery 1997] exploits the
effectiveness of the wavelet transform in decorrelating the input data. A base mesh
is used as the seed for a recursive subdivision process, during which the 3D details
(i.e. the wavelet coefficients) needed to refine the original shape at every given mesh
resolution are added to the new vertex positions predicted by the subdivision
scheme. The wavelet-transformed mesh is efficiently encoded by employing
zerotree coding techniques [Shapiro 1993] previously developed for image
compression.
The MESHGRID surface representation [ISO/IEC 2004, Salomie 2005, Salomie
2004b], described further in this chapter, lies somewhat in between the two
categories. Features common to the techniques belonging to the first category are
combined with wavelet-based multi-resolution techniques for refining the shape.
The peculiarity of the MESHGRID representation lies in combining a wireframe (i.e.
the connectivity-wireframe), describing in an efficient implicit way the connectivity
between the vertices of the surface, with a regular 3D grid of points (i.e. the
reference-grid), acting as a reference-system for the object.
In the next section, MESHGRID will be described in more detail, since both our
scalable L-infinite coding and joint source and channel mesh coding techniques are
instantiated based on MESHGRID. We note that the core re-meshing technique that
accompanies MESHGRID is a surface extraction method called TRISCAN [Salomie
2001, Salomie 2005, Salomie 2004b]. TRISCAN is used in order to generate multi-
resolution surface representations of the 3D object starting from classical
IndexedFaceSet or implicit surface representations. In the chapter we do not
describe TRISCAN, and refer the interested reader to the literature instead see
[Salomie 2001, Salomie 2005, Salomie 2004b].
The remainder of the chapter is structured as follows. Section 2.2 gives an
overview of the MESHGRID surface representation, focusing on the coding
techniques used in order compress the reference-grid and connectivity wireframe.
MeshGrid Overview 9

Section 2.3 lists the main features that characterize MESHGRID, and finally, section
2.4 draws the conclusions of this chapter.
2.2 MESHGRID REPRESENTATION
MESHGRID [Salomie 2005, Salomie 2004b] is a hierarchical multi-resolution
representation and scalable encoding method for 3D objects. MESHGRID differs
from the other methods present in the literature by the fact that it does not only
preserve the surface description, but also the volumetric description of the model
and the relationship between them. The surface description is specified as the union
between a connectivity-wireframe (CW), describing the connectivity between the
vertices, and a 3D grid of points, i.e. the reference-grid (RG), characterizing the
space inside and outside the CW. The particularity of the MESHGRID representation
lies in attaching the vertices of the CW to the RG points. Basically, encoding the
mesh with MESHGRID is equivalent to encoding the vertex positions, given by the
RG, and the connectivity between them, given by the CW. An example illustrating
the decomposition of a MESHGRID object into its components is given in Figure 2-1.

(a) MeshGrid Object

+ =
(c) RG
(b) Hierarchical CW
(d) Hierarchical RS

Figure 2-1: MESHGRID representation of the Cuboid model consisting of a (i)
connectivity-wireframe (CW) and (ii) reference-grid (RG) represented by a
hierarchical set of reference-surfaces (RS).
The reference-grid is defined by the intersection points between three different
sets of reference-surfaces (RS). The discrete position of each RG point represents
the indices of the RSs intersecting in that point, while the coordinates of the RG
point are equal to the coordinates of the computed intersection point. This concept is
shown in Figure 2-2, the RS from each set being displayed in a different color.
We note also that the intersection between any two reference-surfaces belonging
to two different sets will define a so-called RG line. The three sets of RG lines
corresponding to the example in Figure 2-2 (b) are depicted using three different
10 Chapter 2

colors in Figure 2-2 (c). We point out that, in the general case, the RSs are not
planar, but curvilinear and non-equidistant.

(a) (b) (c)
Figure 2-2: Example of the RG and its associated RSs: (a) uniform (regular) and (b)
non-uniform (deformed) RSs, and (c) the RG of (b) displayed separately.
The connectivity-wireframe keeps the 3D connectivity information between the
vertices, and consists of a series of connectivity vectors, each of these vectors
linking two vertices located inside the same RS. Any type of wireframe, whether it
is triangular, quadrilateral, or polygonal, can be attached to a RG. The connectivity
information between the vertices of the mesh is efficiently stored by deriving the
discrete position of the next vertex in the RG from the discrete position of the
previous vertex. Additionally, the normal vector to the surface is obtained by
computing the cross product between the connectivity vectors.

5
2
3
4
1
G
1
G
2
6
5
2
3
4
1
G
1
V
6
G
2
V

Figure 2-3: A cross-section through a 3D object, illustrating the contour of the
object, the RG, and the relation between the vertices (belonging to the CW and
located at the surface of the object) and the grid points.
An example illustrating the construction of the CW is given in Figure 2-3. Each
vertex from the CW is located on a RG line, e.g. the line with label 1, resulting from
the intersection between two RSs, belonging to two different sets. The RG line
passes implicitly through the series of RG points (labels 2, 3) resulting from its
intersection with the RSs belonging to the third set. The vertices of the re-meshed
object (e.g. label 4) are given by the intersection points between the RG lines and
the objects surface. We notice that a RG line might intersect the surface of the
MeshGrid Overview 11

object at different positions; hence, in each intersection point a different vertex must
be considered. For a closed surface, the number of intersection points between a grid
line and the surface is even (grid lines tangent to the surface define multiple
overlapped vertices).
The coordinates of the vertices from the CW do not need to be encoded explicitly,
since their values are derived from the coordinates of the RG points. The procedure
used to make the link between the vertices and the RG points is the following (see
Figure 2-3): (i) find for each vertex V the two grid points
1
G , located inside the
object, and
2
G , located outside the object, such that both
1
G and
2
G are positioned
on the same grid-line (label 6) as V and they are the closest to V , and (ii) consider
1
G as the reference point of V . The RG points with similar properties as
1
G and
2
G (e.g. labels 2, 3) are called border reference-grid points since the objects
surface (label 5) passes between these points. We observe also from Figure 2-3 that
it is possible to attach several vertices to the same RG point.
By using the RG, there is no need to store the coordinates of the vertices, but store
instead the discrete positions of the corresponding RG points. Thus, the coordinates
of any vertex V can be computed as the sum of the coordinates of the
corresponding RG point
1
G and an offset:

1
V G offset = + (2.1)
The offset is defined as a relative value instead of an absolute one; this has the
advantage that the coordinates of the vertices can be recomputed from the RG
coordinates after having applied arbitrary deformations to the RG (see right side of
Figure 2-3).
We point out also that the RG is a smooth vector field defined on a regular
discrete 3D space. In our approach, the RG is efficiently compressed using a
scalable 3D wavelet-based intra-band coding algorithm. The RG coding algorithm
operates in resolution-scalable mode and encodes the wavelet coefficients of the
wavelet-transformed RG coordinates in a bitplane-by-bitplane manner, using
quadtree-based coding strategies [Salomie 2005, Salomie 2004b]. The CW is
losslessly encoded at each spatial resolution using a 3D extension of chain-codes
[Salomie 2005, Salomie 2004b]. More details about the employed wavelet transform
and RG encoding algorithm are given in section 2.2.1.
Overall, MESHGRID allows for lossy to near-lossless encoding, and yields a single
multi-scalable compressed bitstream from which appropriate subsets, producing
different visual qualities and resolutions, can be extracted to meet the resolution,
quality and bit-rate requirements of each client terminal used for visualization. An
example illustrating the resolution scalability is given in Figure 2-4.
12 Chapter 2

Figure 2-4: The hierahical Rabbit MESHGRID model consisting of six resolution
levels for both the connectivity-wireframe and the reference-grid. The resolution
varies, from left to right, from 87336 triangles to 72 triangles.
The close association between the CW and the RG allows for an efficient
encoding of the model (see section 2.2.2), and provides flexible animation and
modeling capabilities (see section 2.3) [Preda 2003]. The uniqueness of MESHGRID
stems from its hybrid nature: MESHGRID is a hybrid object representation storing
both the surface (i.e. the connectivity-wireframe) and the volumetric information
(i.e. the reference-grid). This finds applications in different fields of activity
employing both surface and volumetric data. A typical application is in the medical
field. The common way to render and visualize the volumetric data is to enable
transparency for objects located at the outside (e.g. skin) in order to view the
internal objects (e.g. blood vessels, organs, bones), as shown in the example of
Figure 2-5. MESHGRID allows for encoding the objects extracted from volumetric
data in a compact way, and it is well suited for streaming and displaying these
models at remote locations. The full list of features characterizing MESHGRID is
exposed in section 2.3.
In the next section, we will dive shortly into the details of the employed wavelet
transform and reference-grid encoding algorithm, as they are intensively exploited in
this dissertation. Section 2.2.2 presents some experiments intended to demonstrate
the efficiency of the MESHGRID system.

Figure 2-5: A composite MESHGRID model consisting of two surface layers and one
reference-grid. The external surface layer is shown with transparency to allow
displaying the internal surface layer. Each surface layer may consist of several
disjoint connectivity-wireframes.

2.2.1 3D Wavelet Decomposition and RG Coding Algorithm
In the beginning of this section, we derive the lifting factorizations [Sweldens
1998] of the wavelet transform used by MESHGRID [Salomie 2004a]. These
factorizations are needed in the derivations of section 3.5.4. The one-dimensional
forward and inverse wavelet transforms respectively, can be expressed via lifting
[Sweldens 1998] as follows:

( ) ( )
(0) (0)
2 2 1
(1) (0) (0) (0) (0) (0) (1) (0)
1 1 2
,
9 1
,
16 16
i i i i
i i i i i i i i
s x d x
d d s s s s s s
+
+ +
= =
= + + + =
, (2.2)

( ) ( )
(0) (1) (0) (1) (0) (0) (0) (0)
1 1 2
(0) (0)
2 1 2
9 1
,
16 16
,
i i i i i i i i
i i i i
s s d d s s s s
x d x s
+ +
+
= = + + +
= =
. (2.3)
In these equations,
2 2 1
,
i i
x x
+
represent the even and odd samples respectively in
the input signal x , and
( ) ( )
,
k k
i i
s d represent the i-th approximation and detail
coefficients respectively computed at the lifting step , 0,1 k k = . The three-
dimensional wavelet transform employed by MESHGRID is a straightforward
implementation of a series of one-dimensional wavelet transforms sequentially
performed in three different directions.
Concerning the RG coding algorithm, each RG component is coded separately, by
14 Chapter 2

means of a progressive multi-resolution algorithm based on a combination of a 3D
wavelet transform and an intra-band volumetric wavelet coder, called Cube
Splitting. Cube Splitting is the 3D extension of the SQP (Square Partitioning)
algorithm proposed in [Munteanu 1999b]. This coding/decoding approach supports
quality scalability, resolution scalability, and ROI coding/decoding.
In a first step, the coefficients generated by the wavelet transform undergo a
scaling before they are coded with the Cube Splitting algorithm. This scaling
operation ensures that the resulting wavelet transform is approximately unitary
[Schelkens 2003]. In this way, distortions occurring in the wavelet subbands are
equally reflected in the spatial domain i.e., in distortion sense, they have the same
impact in the spatial domain. In a second step, the coefficients are quantized using
successive approximation quantization (SAQ), by which the significance of the
wavelet coefficients with respect to a series of dyadically reduced thresholds of the
form
max
2 , 0
b
b
T b b = s s is determined.
In a third step, the coefficients are encoded in a bit-plane by bit-plane fashion,
starting from the most significant bitplane, corresponding to
max
b , and ending with
the least-significant bit-plane, corresponding to 0 b = . Conceptually, there are two
coding passes applied for each bit-plane, i.e. a significance pass and a refinement
pass. For the most significant bitplane only the significance pass is performed.
For a given bitplane b , the significance pass encodes the locations k of the
wavelet coefficients ( ) w k that (i) were not significant with respect to the
previously applied threshold
1 b
T
+
, i.e. ( )
1 b
w T
+
< k , and that (ii) became significant
with respect to the currently applied threshold
b
T , i.e. ( )
b
w T > k . In order to
encode the locations of these newly found significant wavelet coefficients, the Cube
Splitting algorithm constructs and encodes octree binary structures. The highest
node in the tree corresponds to the entire wavelet-transformed RG. Each node in the
octree indicates whether it contains at least one significant wavelet coefficient or
not. In the negative case, a non-significant symbol is associated with the node. In the
positive case, a significant symbol is associated with the node, and the node is split
into eight corresponding nodes. The octree decomposition process is carried out
recursively for significant nodes until all significant wavelet coefficients are isolated
and the octree binary structure for the entire bit-plane is constructed. The octree
nodes corresponding to single pixels (i.e. single wavelet coefficients) record also the
signs of the corresponding coefficients.
Encoding the octree data-structures is performed by visiting the octree is a depth-
first manner and by writing the corresponding node symbols in the output stream.
We note that the significance of a node in the significance pass needs to be encoded
only once: obviously, once a node becomes significant with respect to a threshold

p
T , it will keep on being significant for all lower thresholds , 1
b b p
T T T s < . We
note also that as soon as a coefficient becomes significant with respect to an applied
SAQ threshold, its sign is encoded as well.
The refinement pass, corresponding to an arbitrary bitplane
max
, 0 1 b b b s s ,
encodes the binary value from the binary representation of all wavelet coefficients
that have been found to be significant in the previous significance passes
max
, p b p b < s . Performing the significance and refinement passes for each bitplane
max
, 0 b b b > > allows for progressively refining the reconstructed wavelet
coefficients at the decoder side. Hence, MESHGRID achieves quality scalability by
relying on embedded quantization and bitplane coding and inherently provides
resolution scalability by exploiting the multiresolution nature of the wavelet
transform. For a more detailed description of SQP, quadtree coding of wavelet
subbands and Cube Splitting algorithms, the reader is referred to [Munteanu 2003,
Schelkens 2003].
2.2.2 Compression Performance
The connectivity-information can be losslessly encoded at four bits per vertex
even without entropy coding, which offers a clear advantage in representing objects
derived from discrete data sets in a compact and lossless way. In addition, the
MESHGRID stream may contain the coding of the reference-grid and the vertex
offsets relative to the grid points.
In order to illustrate the coding performance of the MESHGRID encoder, the same
mesh description has been compressed using both 3DMC and MESHGRID. In the
case of 3DMC [Taubin 1998b], the input mesh description has been represented as
an IndexedFaceSet. The shaded surfaces of the reconstructed mesh compressed at
different bits rates, and the corresponding sizes of the multi-resolution 3DMC and
MESHGRID streams are shown in Figure 2-6 on the first and second rows respectively.
The reference-grid has been chosen to be uniformly distributed. Hence, the bit
rate for the MESHGRID stream is equal to four bits per vertex due to the
connectivity coding, plus n bits per offset (bpo, as specified for each case)
representing the number of bits used to quantize the vertex offsets. For the visually
lossless case, shown in the last column of Figure 2-6, the ratio between the size of
the multi-resolution 3DMC and MESHGRID streams is roughly 3.5. In single-
resolution mode, this ratio drops to 1.5. Notice that 3DMC takes advantage of
arithmetic coding, while MESHGRID does not employ entropy-coding techniques.
Notice also that the view-dependent MESHGRID stream introduces an overhead in
the range of 10% to 25%, depending on the relative size of the ROI with respect to
the size of the reference-grid.
16 Chapter 2

2056 bytes 2434 bytes 3088 bytes 4095 bytes 8071 bytes
1128 bytes 1515 bytes 1709 bytes 1902 bytes 2288 bytes

Figure 2-6: Visual comparison of the Torus object consisting of 1546 vertices,
encoded with 3DMC (first row of images) and MESHGRID (second row of images)
respectively.
Quadrilateral meshes are particular types of meshes that satisfy the constraints
imposed by the connectivity-wireframe, and therefore can be efficiently represented
using MESHGRID. In particular, if the quadrilateral mesh fulfills the requirement of a
subdivision surface, i.e. each higher resolution of the mesh can be obtained from the
immediate lower level by performing a uniform split of each quad into four sub-
quads, then it can be encoded very efficiently by using MESHGRID. In this case, the
connectivity between the vertices needs to be specified only for the lowest
resolution-level; this already introduces a gain in the range of 2 to 2.6 bits per vertex
in comparison to the case when the connectivity is encoded at each resolution level.
We point out also that the reference-grid can be non-uniformly distributed; thus, it
can be defined in such a way that each vertex offset is equal to 0.5, corresponding to
the default vertex position with respect to the reference-grid. In this case, the offsets
do not need to be encoded at all, which offers an additional (considerable) coding
gain compared to the case when offsets are encoded for each vertex.
2.3 MESHGRID FEATURES
In this section, we illustrate in more detail the most important features of the
MESHGRID representation method.
2.3.1 Scalability
In the multi-resolution MESHGRID representation, both the CW (see Figure 2-7)
and the RG (see Figure 2-8) have a hierarchical structure. The hierarchical structure
enforces that vertices found in a lower level are available in all higher levels that
follow. However, each level will alter the connectivity between the existing vertices.

For example, vertex
l
n
v in Figure 2-7 is connected on level l with vertex
l
m
v via the
blue colored line; this link is replaced in level 1 l + by another line (green color) to
vertex
1 l
p
v
+
, and replaced again in level 2 l + by the red line that connects it to
vertex
2 l
q
v
+
, and so on. Note that the level of a vertex indicates the position in the
hierarchy when it first appears.

Figure 2-7: The hierarchical connectivity-wireframe of the MESHGRID
representation.
The hierarchical MESHGRID representation imposes the following constraint on
the reference-system: the reference-system of any level is a super-set of the
reference-system of the lower levels. Figure 2-8 shows the changes in the reference-
system when generating a hierarchical MESHGRID with three levels. The first level
in Figure 2-8(a) consists of three RSs colored in blue. The second level in Figure
2-8(b) has in addition other RSs (colored in green), while the third level (Figure
2-8(c)) contains the RSs of the previous levels as well as the RSs colored in red. We
notice that each higher level in the hierarchy keeps the RSs of the previous levels
and adds an extra RS in between two existing RSs from the previous levels.

(a) (b) (c)

Figure 2-8: The changes in the reference-system when generating a hierarchical
MeshGrid with 3 levels: (a) first level consisting of 3 RSs shown in blue, (b) the
second level adding the RSs shown in green, and (c) third level, containing the RSs
from the previous levels and the new ones shown in red.
18 Chapter 2

(a) 1089 vertices (b) 4225 vertices

(c) 16641 vertices (d) 66049 vertices
Figure 2-9: Quadrilateral MESHGRID model: different resolution-levels obtained by
homogeneous split operations.
The hierarchical nature of the RG and CW inherently brings spatial and quality
scalability. An example is given in Figure 2-9, which illustrates the capability of
MESHGRID to provide mesh-resolution scalability for the particular case of
quadrilateral meshes. A second example is shown in Figure 2-10, illustrating the
scalability in shape precision (or quality scalability) obtained by decoding the
MESHGRID model at four different rates. The connectivity-wireframe has been
reconstructed at the finest resolution-level (shown in Figure 2-9 (d)), while the
reference-grid has been progressively refined at different quality levels. The bit-rates
of Figure 2-10 are the overall rates for the entire MESHGRID model.

(a) 2.07 bits/vertex (b) 3.107 bits/vertex

(c) 4.246 bits/vertex (d) 7.05 bits/vertex
Figure 2-10: Quadrilateral MESHGRID model: visual comparison of the mesh
reconstructed at the last resolution-level (of Figure 2-9 (d)) at different bitrates.
We notice also that both the CW and the RG can be encoded at each resolution-
level either globally (i.e. for the entire object), or in separate regions of interest
(ROIs). ROI coding is an important functionality required in order to enable view-
dependent decoding. With this respect, the RG can be divided into ROIs, and the
surface can be encoded locally in each of these ROIs. The global encoding
corresponds to defining a single ROI encompassing the complete RG. An example
of a view-dependent mode is shown in Figure 2-11, illustrating the use of ROIs at
different resolution-levels.
20 Chapter 2

Figure 2-11: A multi-resolution MESHGRID can be coded as one mesh (view-
independent coding) or split into several ROI (view-dependent coding).
In the end of this section, we want to draw attention towards another important
dimension in scalability, namely multi-core scalability. Nowadays, the processing
units have the tendancy to evolve towards spreading the computational power across
multiple cores. Therefore, in order to take advantage of the entire processing power,
an application should be developed following a multi-core design. In this context, a
multicore algorithm can be designed for both the MESHGRID encoder and decoder,
exploiting the features presented in this section, i.e. the hierarchical CW and RG
coding, the mesh division into ROIs. An example of such architecture is
demonstrated in Figure 2-12.

Figure 2-12: A multi-core architecture of a MESHGRID encoder, designed for four
cores (the cores allocation is represented by the colored blocks).

2.3.2 Animation and Morphing
In addition to the classical vertex-based animation, the MESHGRID representation
allows for specific animation capabilities, such as (i) reshaping on a hierarchical
basis of the RG and its attached vertices, and (ii) rippling effects by changing the
position of the vertices relative to corresponding RG points.
The former type of animation can be done on a hierarchical multi-resolution
basis: deforming the reference-grid for a lower resolution mesh will propagate the
action to the higher levels, while applying a similar deformation at the higher
resolution-levels will only have a local impact. The vertices of the wireframe, and
CW
RG
CW codec
Q DWT
ROI 1
MeshGrid
codec
ROI 1
bitstream
MeshGrid
bitstream
CW
RG
CW codec
Q DWT
ROI 2
MeshGrid
codec
ROI 2
bitstream

therefore the shape of the surface, are updated each time the RG points are altered.
In this sense, we illustrate here the animation of the Humanoid model, depicted in
Figure 2-13, realized by keeping the CW unchanged for the entire sequence and
modifying only the RG coordinates. Given that the vertices are attached to the RG
and their coordinates are derived from the coordinates of the RG points, the
animation of the RG points can be used to obtain the same effect as a direct
animation of the vertices [Salomie 2005, Salomie 2004b]. A part of the reference-
grid is displayed together with the shaded surface, in order to illustrate the impact of
the animation on the mesh and grid. The advantage of using RG-based animation is
that the animation can be defined in a hierarchical and more straightforward manner
[Preda 2003, Salomie 2005, Salomie 2004b].

Figure 2-13: Animation of the Humanoid model by altering the positions of the
reference-grid points.
Additionally, MESHGRID has the possibility of encoding oscillating or low
amplitude animations, such as ripple or wave effects. In this case, the differences
between successive frames can be encoded at the level of the vertex offsets, which is
very compact because an offset is only a scalar value. When decoding, any changes
in the offsets will trigger, according to equation (2.1), an update of the vertex
coordinates. The animation example from Figure 2-14 simulates the propagation of
a wave. It is a typical animation example of an elevation model.
22 Chapter 2

(a) (b) (c)

Figure 2-14: An animation example accomplished by using rippling effects.
Another example of reference-grid animation is the morphing of a face, as
illustrated in the sequence of images from Figure 2-15. In this case, the absolute
value of the reference-grid coordinates are successively decreased, such that the
original Human Head model in Figure 2-15(a) is deformed progressively to a box,
Figure 2-15(h), where all the reference-grid coordinates are zero.
Considering the above, one concludes that MESHGRID is perfectly suited for
creating sequences of animated and/or morphed volumetric objects, and for
encoding and displaying the results.

(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)

Figure 2-15: Morphing of a human head to a box by deforming the reference-grid.

2.3.3 Streaming
To demonstrate the scalability and streaming capabilities of MESHGRID, we have
implemented a client-server application in order to illustrate how these features can

be effectively applied in practice. The server is a content provider of 3D scenes
represented in the MESHGRID format. It accepts connections coming from client
terminals, and performs streaming of the 3D content towards them. The client
application may run on terminals with very different capabilities regarding the
processing power, display resolution, or network bandwidth. Thanks to the
scalability and streaming capabilities of MESHGRID, the client is able to request and
decode only the necessary ROIs and resolution levels, matching the limits of the
terminal. The server is capable of parsing and indexing the MESHGRID stream,
containing the mesh encoding, offset encoding or grid encoding, and is able to
access them in a random order, according to the requests of each client. The client
decodes the MESHGRID stream, which can be received in any logical order, and
display the 3D scene.
Finally, we point out that the scalable nature of MESHGRID allows for an
optimized transmission of MESHGRID encoded streams over error-prone channels. In
this sense, the client-server application served as a development platform for our
joint-source and channel coding design described in Chapter 4.
2.4 CONCLUSIONS
This chapter gives an overview of MESHGRID, which is one of the generic 3D
object representation formats from MPEG-4 AFX. The unique characteristic of the
MESHGRID representation method is that it preserves both volumetric information as
well as the surface description of the model. The presence of the volumetric
information allows for specifying material properties, i.e. density, elasticity, etc,
characterizing the volume of the model; these can be useful for example in virtual
reality applications involving haptic devices for force-feedback.
MESHGRID is multi-scalable, in terms of level of detail and quality, and allows for
view-dependent reconstruction of the object. Additionally, its ROI-based encoding
capability allows for efficient storing and retrieval of 3D objects of any size.
Furthermore, MESHGRID has particular animation and morphing possibilities, as the
volumetric modeling of 3D objects.
MESHGRID is appropriate for representing dynamic models as well, with
applications in scientific simulations or virtual environments, or for encoding the
animation as a 3D interactive video. The 3D objects encoded as MESHGRID streams
can be efficiently transmitted and adapted to terminals with various capabilities,
without the need of re-encoding the models.

Chapter 3
WAVELET-BASED L-INFINITE CODING
OF MESHES
3.1 INTRODUCTION
The diversification of content and the increasing demand in mobility has led to a
proliferation of heterogeneous terminals, with diverse capabilities. Efficient storage
and transmission of digital data is therefore a critical problem, which can be solved
by compressing the original data based on some predefined criteria.
There is a broad range of applications (e.g. in the medical area), where one cannot
afford information loss due to compact coding. A viable solution in this case
consists in lossless coding possibly coupled with multi-functionality support, such as
scalability and progressive (lossy-to-lossless) reconstruction of the input data.
Lossless coding is downsized however by the fairly low achievable lossless
compression ratios. There are other applications, such as those in the field of remote
sensing, where one can accept information loss in favor of higher compression
ratios, provided that the incurring distortions are rigorously bounded. In such
applications, lossy or near-lossless compression are suitable, but an appropriate
distortion measure needs to be employed in order to accurately quantify and control
the distortion incurred by the compression system.
Ideally, the distortion measure should be one that is easy to compute, has certain
usefulness in analysis and offers a perceptual meaningfulness, in the sense that a low
(high) distortion measure implies good (poor) perceptual quality. Unfortunately,
there is no single distortion measure in the literature that satisfies all three
requirements. Undoubtedly, the most commonly met distortion measure in the
literature is the squared error, commonly referred to as the Mean Squared Error
(MSE). The MSE, though, satisfies only the first two requirements. The MSE is an
average distortion measure, giving a good approximation of the global error and an
expression of the overall perceptual quality. One of its major drawbacks consists in
26 Chapter 3

the fact that it does not exploit local knowledge. Moreover, the local error
behavior is lost, due to an averaging of the reconstruction error throughout the entire
data. However, there are applications that require imposing a tight bound on the
individual elements of the error signal, i.e. constraining the elements of the
reconstruction error signal to be under some given thresholds. Therefore, a new
distortion measure is needed to address these issues. As an answer, the L-infinite
norm criterion has been proposed as a candidate for a perceptually meaningful norm,
in that the distortion provides a good approximation of the maximum local error.
In this chapter, we propose the novel concept of local error control in lossy coding
of meshes. With this respect, a scalable L-infinite mesh coding approach is
proposed, simultaneously performing local error control and providing scalability in
L-infinite sense. Different architectures based on which a scalable L-infinite mesh
codec design can be made are investigated. The analysis reveals that intra-band
wavelet codecs, such as MESHGRID [Salomie 2004a] should be considered in order
to provide fine-granular scalability in L-infinite sense. Consequently, the proposed
L-infinite coding approach is instantiated by using our scalable MESHGRID coding
system [Salomie 2004a]. In this context, two L-infinite distortion estimators are
derived, expressing the L-infinite distortion in the spatial domain as a function of
quantization errors occurring in the wavelet domain. Employing these estimators
enables an optimized rate-allocation for given local-error bounds, without
performing an actual decoding of the mesh. Finally, in order to minimize the overall
bit-rate subject to a pre-defined local error upper-bound, a constrained-optimization
problem is solved, wherein the layers to be transmitted on each subband are
determined. In this context, a fast algorithm for solving the optimization problem is
conceived, enabling a real-time implementation of the rate-allocation.
The chapter is structured as follows. We begin by defining the L-2 norm and the
L-infinite norm, in section 3.2. Next, an overview of the near-lossless L-infinite-
oriented data compression techniques is given in section 3.3. Section 3.4 presents
the mathematical formulation of the smallest upper bound of the L-infinite distortion
in lifting-based wavelet transforms. Section 3.5 derives L-infinite estimators for the
considered wavelet-based codec design. The link between the L-infinite metric and
the Hausdorff metric is given in section 3.6. Section 3.7 details some special
considerations for the MESHGRID instantiation of the proposed techniques. Section
3.8 reports the experimental results obtained with our L-infinite coding approach.
Finally, section 3.9 draws the conclusions of this chapter.

Wavelet-based L-infinite Coding of Meshes 27

3.2 DISTORTION METRICS
The classical approach towards lossy compression consists in optimizing the
compression scheme so that to maximize the overall compression ratio for a given
reconstruction error. The quality of reproduction can be measured by using a
distortion measure ( , ) 0 d > x y that assigns a distortion or cost to the reproduction of
the input
1
( ,..., )
k
x x = x by the output
1
( ,..., )
k
y y = y . The quality of reproduction is
highly dependent on the type of data that is compressed. This is because the
significance of each pixel in the data varies with data type. For example, let us
consider the compression of two images, one being a grayscale photography and the
other one a cartographic image, representing measurements of heights for example.
While for the photographic image the usually employed MSE (and alike) quality
metrics are global and reflect the overall quality of the image after compression, for
the cartographic image, the quality needs to be expressed by the precision of each
pixel measurement after compression is applied. Hence, while for some data the
overall error is to be accounted for, for other data the local error is of outmost
importance. Unfortunately, there is no single distortion measure in the literature
capable of expressing both global as well as local characteristics in the error signal.
Hence, the metric to be considered in a specific application depends on the input
data type and on the subsequent processing steps and interpretation of the
compressed data.
3.2.1 L-1 and L-2 Distortion Metrics
A common distortion measure in the literature is the absolute error, defined as:

1
1
( , )
n
i i
i
d x y
=
=
x y (3.1)
Relation (3.1) can be written as
1
1
( , ) x y x y d = , the corresponding norm being
the L-1 norm, expressed as
1 1
n
i i
i
x y
=
=
x y . Another common distortion

measure, which is easy to compute and it has certain usefulness in analysis, is the
(un-normalized) squared error, defined as:

2
2
1
( , )
n
i i
i
d x y
=
=
x y (3.2)
Relation (3.2) can be written as
2
2
2
( , ) x y x y d = , where
2
1 2
n
i i
i
x y
=
=
x y is the L-2 norm.

In a similar manner, we can define a more general L-p norm as:

( )
1
1
p n p
i i
i p
x y
=
=
x y . (3.3)
This allows for an extension of the distortion measure d to any power of the L-p
28 Chapter 3

norm:

,
( , ) x y x y
r
p r
p
d = (3.4)
For r p = we obtain an additive distortion, simply denoted as ( , ) x y x y
p
p
p
d = ,
commonly known as the
th
r power distortion. It is clear that the distortions
corresponding to 1 r p = = and 2 r p = = , i.e. the L-1 and the L-2 distortions, are
particular cases of the
th
r power distortion measure. It is important to highlight the
terminology difference between the L-p norm, defined by (3.3), and the L-p
distortion, expressed by (3.4). We point out that in the remainder of this work,
unless it is clearly stated, we systematically refer to the L-p distortion instead of the
L-p norm.
3.2.2 L-infinite Distortion Metric
A variation of the L-p norm is the L-infinite norm, defined as
max
i i
i
x y
= x y . The L-infinite distortion measure is given by:

1
( , ) max
i i
i n
d x y
s s
= x y (3.5)
In general, the squared error distortion (or L-2 distortion) expressed by (3.2) is
regarded as a useful indicator of perceptual quality. Its statistical average, which is
commonly referred to as the Mean Squared Error (MSE), is then regarded as giving
a good approximation of the global error. MSE is undoubtedly the most commonly
met distortion measure in the coding literature.
However, one of its major drawbacks is that it does not reflect local error
behavior, which is lost due to an averaging of the reconstruction error throughout
the data. Local-error control is imperative in a broad range of coding applications in
various domains, such as medical imaging, remote sensing, space imaging or image
archiving. Such applications require imposing a tight bound on the individual
elements of the error signal, i.e. constraining the elements of the reconstruction error
to be lower than some given threshold(s). To solve this problem, the L-infinite norm
has been proposed as a meaningful distortion criterion, in that, opposite to the L-2
norm, the distortion provides a good expression of the maximum local error [Alecu
2004, Alecu 2006, Alecu 2003b].
In the next section, we will give an overview of a representative set of coding
techniques that make use of the L-infinite metric as target distortion metric. The
section follows a chronological review of various techniques proposed in the
literature, starting from the near-lossless predictive-based image coding systems
proposed in the mid nineties and ending with the recently proposed scalable L-
infinite wavelet-based image coding systems, which are of particular importance in
the context of our work.

3.3 NEAR-LOSSLESS L-INFINITE-ORIENTED DATA
COMPRESSION
Data compression based on the L-infinite distortion metric is a relatively young
field of research. The largest part of techniques proposing the L-infinite norm as the
target distortion measure has been developed for image compression. One of the
first proposed L-infinite-oriented compression methods is the Context-Based
Adaptive Lossless Image Codec (CALIC) proposed by Wu and Memon [Wu
1997b], based on a spatial-domain predictive coding scheme.
Statistical modeling of the data source is an inherent step in the general scheme of
almost any data compression system. A major difficulty in the statistical modeling
of continuous-tone images arises from the large size of the alphabet (caused by the
large number of possible pixel values). Context modeling of the alphabet symbols
leads to a large number of possible contexts, or model states. If the number of these
contexts is too large with respect to the size of the image, one cannot reach good
estimates of the conditional probabilities on the model states, due to the lack of
sufficient samples. This problem is commonly known in literature as the sparse
context or context dilution problem, and has been formulated theoretically by
Rissanen [Rissanen 1984, 1983] in the framework of stochastic complexity, as the
model cost. The CALIC algorithm attempts to reduce the model cost by
employing a two-step approach involving prediction followed by quantization and
encoding of the residuals.
In the prediction step, CALIC employs a simple gradient-based non-linear
prediction scheme. The scheme, suggestively entitled Gradient-Adjusted Predictor
(GAP), adjusts prediction coefficients based on estimates of local gradients. A
unique feature of CALIC is the use of a large number of modeling contexts to
condition the non-linear predictor and to make it adaptive to varying source
statistics. The non-linear predictor adapts via an error feedback mechanism. In this
adaptation process, CALIC only estimates the expectation of prediction errors
conditioned on a large number of contexts, rather than estimating a large number of
conditional error probabilities. Thus, the estimation technique can afford a high
number of modeling contexts, without suffering from the previously described
sparse context problem.
Another key feature of CALIC is the way it distinguishes between binary and
continuous-tone images on a local, rather than a global, basis. Thus, the coding
system operates in two modes, namely a binary mode and a continuous-tone mode.
The binary mode is designed for the situation in which the current locality of the
input image has no more than two intensity values, i.e. the neighboring pixels of the
30 Chapter 3

current pixel have no more than two different values. In this situation, the pixel
values are coded directly, while predictive coding is employed in the continuous-
tone situation. The selection between the continuous-tone mode and the binary mode
is thus based on pixel contexts, the choice between these two being automatic and
completely transparent to the user. The binary operating mode allows for efficient
coding performance in the case of uniform or nearly uniform image areas, graphics,
rasterized documents, line art, and any combination of natural images with one or
more of these types, i.e. the so-called compound images.
The near-lossless CALIC compression scheme [Wu 1997b, Wu 1996] provides
excellent coding results, in both the L-2 and the L-infinite sense, in comparison to
other state-of-the-art coding schemes. However, for large values of the maximum
allowed pixel error, the CALIC codec starts to lose ground to wavelet-based coders,
in terms of the peak signal-to-noise ratio (PSNR). This is mainly caused by the
existence of the residue quantizer in the prediction loop. X. Wu and P. Bao [Wu
2000, Wu 1997a] have proposed an enhanced near-lossless compression scheme,
based on the CALIC scheme, in which they correct the prediction biases caused by
the quantization of the prediction residues. Additionally, in [Wu 2000] a
generalization of the binary mode of CALIC has been proposed. Originally,
CALICs binary mode was designed to improve the coding efficiency on compound
images that mix photographs, graphics, and text. For continuous-tone images, the
CALIC codec would switch automatically to continuous mode. It was shown also
that the binary mode technique was also highly effective for L-infinite-constrained
compression of images with rich high-frequency components.
The coding schemes discussed so far ([Wu 2000, Wu 1996]) provide excellent
near-lossless compression results, and represent the state-of-the-art in L-infinite-
oriented coding. However, neither of these spatial-domain compression schemes is
capable of generating an embedded, while at the same time L-infinite-oriented, bit
stream.
A near-lossless hybrid compression scheme with progressive capabilities has been
proposed by Ansari et al. [Ansari 1998], in which near-lossless compression and
scalability are provided within the same framework by truncation of the embedded
bit stream at an appropriate point, followed by transmission of a residual layer so as
to provide the near-lossless bound. However, this coding scheme still does not allow
for genuine scalability in L-infinite sense.
I. Avcibas et al. proposed a spatial-domain compression technique that provides
lossless and near-lossless compression, combined with progressive transmission
capabilities [Avcibas 2002]. The scheme allows for near-lossless reconstruction,
with respect to a given bound, after decoding of each layer of the successively

refinable bit stream. The bounds are pre-set at coding time and remain fixed
throughout the decoding phase. Successive refinement is obtained by computing
improved estimates of the probability density function (PDF) of each pixel with
successive passes through the image, until all the pixels have been uniquely
determined. The restriction of the support of the PDF to a successively refined set of
intervals leads to the integration of lossless/near-lossless compression in a single
framework. In other words, diminishing of the support of the PDF in each pass to a
narrower interval allows for progressive coding capabilities, while fixing of the size
of the interval allows for near-lossless coding.
This compression scheme is organized as a multi-pass scheme. In the first pass, a
Gaussian model is assumed for the statistics of the current pixel. Additionally,
assumptions are made that the data is stationary and Gaussian in a small
neighborhood, allowing for the use of linear prediction. The current pixel is
predicted using the six pixels situated in its causal neighborhood; causal pixels refer
to those pixels that have been visited before a given pixel in a raster-scan order. The
linear prediction coefficients that are employed in the linear regression model are
derived using the forty context pixels. A Gaussian density function is fitted for the
current pixel, with the linear prediction value taken as the optimal estimate of its
mean, and the mean square prediction error as its variance. The actual value of the
pixel, or the sub-interval in which it is found, is then predicted using an optimal
guessing technique. Thus, the support of the current pixels PDF, i.e. initially taken
as | | 0, 255 , is divided into a set of intervals ( | , 2 1 i i o + + , where o
-
e . The
intervals are sorted with respect to their probability mass obtained from the
estimated density function, and the interval in which the pixel is located is then
identified. At the end of the first pass, the maximum error in the reconstructed image
is o , since the midpoint of the interval is selected as the reconstructed pixel value.
In the subsequent passes of the algorithm, the PDF of each pixel is further refined,
by narrowing the size of the interval in which it is known to be located. The
refinement scheme uses both causal and noncausal pixels; noncausal pixels refer to
the pixels that have been visited after the current pixel in a raster scan order. A fact
to be noted is that in this update scheme, the causal pixels already have a refined
PDF, while the noncausal pixels do not. By narrowing the size of the intervals by a
factor of two at every additional pass, the algorithm generates an L-infinite-oriented
embedded bit-stream, going down to 0 o = , i.e. to lossless compression.
Moving towards transform-domain compression schemes, the coding technique of
Karray et al. [Karray 1998] proposes near-lossless image compression based on a
filter-bank coding approach. A perfect reconstruction octave-band filter-bank is
employed in order to decompose the input signal. Scalar uniform quantization is
32 Chapter 3

subsequently applied to every resulting subband signal, followed by Huffmann
coding, which is performed in order to losslessly encode the quantized transform
coefficients.
Given a set of perfect reconstruction filters, optimal quantizers are computed so
that to minimize the Huffmann global bit rate, while targeting a certain near-lossless
criterion. This criterion is specified in terms of a proposed percentage % p of
reconstruction errors x A that must lie within a confidence interval. The confidence
interval is chosen for simplicity as having a constant amplitude that is dependant on
a given threshold t :
{ }
% prob x t p A s > (3.6)
It has been shown by Karray et al. [Karray 1998] that the reconstruction errors
can be written as linear combinations of the quantization errors which are multiplied
with some of the filter coefficients, depending on the dimension of the input signal
(1D or 2D) and the parity of the reconstruction errors x A . Thus, for one-
dimensional signals and one iteration of the filter-bank, two cases are considered,
namely even and odd, i.e.
2n
x A and
2 1 n
x
+
A . In the two-dimensional situation, again
for one iteration, these will extend to four possible even-odd combinations, i.e.
2 ,2 n m
x A ,
2 1,2 n m
x
+
A ,
2 ,2 1 n m
x
+
A and
2 1,2 1 n m
x
+ +
A . As the quantization errors have a
uniform distribution, it is possible to establish an upper bound on the reconstruction
errors, for each of the four combinations. The linearity of the expressions of the
reconstruction errors will extend to the expressions of the upper bounds, in that
these will be written as linear combinations of the maximum quantization errors per
subband (in absolute value being 2
i
A ), multiplied with some positive constants;
the term
i
A denotes the quantization bin size in subband i . The constants are
written in terms of the filter coefficients. Their positive nature arises from the fact
that one obtains an upper bound of a reconstruction error if and only if every
maximum quantization error has the same sign as the constant it is multiplied by.
The L-infinite norm of the reconstruction errors will then be the maximum of the
upper bounds obtained for each of the allowed combinations. For J filter-bank
iterations, this result extends to:

( )
{ }
2 ,2
, ,
max max
J J
n u m v
u v n m
x x
+ +
A s A (3.7)
where , 0, , 2 1
J
u v = , i.e.
2
2
J
possible combinations, stemming from the J
successive interpolations in the synthesis phase.
Since x A is the contribution of the interpolated samples
, u v
x A , the relation
x t
A s must be satisfied for every possible combination ( ) , u v , yielding a system

of linear constraints. Thus, for every
2
1, , 2
J
j = , one can write:

2
i
ij
i
a t
A
s
(3.8)
where
ij
a denotes a positive constant that multiplies the subband quantization bin
size
i
A . This constant depends on the set of filter coefficients involved in subband i
and the combination
2
1, , 2
J
j = of u and v .
A constrained optimization problem is thus created, in which one minimizes the
total bit-rate subject to the system of
2
2
J
linear constraints (3.8). The solutions of
this optimization problem then give the optimal subband quantization bin sizes
i
A .
As pointed out by Karray et al. [Karray 1998], the constraints have been taken
into account through an upper bound of the L-infinite norm of the reconstruction
error. It has been shown [Karray 1998] that in practice, i.e. for natural images, one
does not achieve the proposed upper bound. Moreover, the obtained maximum error
is usually smaller than the required threshold t , practical observations [Karray
1998] indicating an order of magnitude of 2 t . As a result, the authors of [Karray
1998] have proposed a further refinement of the quantization steps, based on the
statistical properties of the reconstruction error distribution. Indeed, as the
quantization errors have a uniform distribution, the reconstruction error becomes, in
view of the previously described linearity, a linear combination of uniform
distributions. The result is a Gaussian distribution reconstruction error, with a zero-
mean, and a variance
2
, u v
o that depends on the quantization step values, respectively
on the set of filter coefficients involved in each subband. The probability term that
appears in (3.6) can then be written as:

{ } { }
,
2
,
2
,
,
1
2
1
2 2
u v
J
u v
J
u v
u v
prob x t prob x t
t
erf
o
A s = A s
| |
| =
|
\ .
(3.9)
where one takes advantage of the Gaussian nature of the reconstruction error
distribution.
A scaling up of the bin sizes
i
A to the new quantization bin sizes
i i
o ' A = A has
been proposed [Karray 1998], in which multiplication of the bin sizes by a common
factor o implies that the variance
2
, u v
o should also be multiplied with a factor
2
o .
The scaling factor is found by solving relation (3.9) for the new variance
2 2
, u v
o o . In
this manner one obtains a controlled refinement of the quantization bins, in that % p
of the reconstruction errors are guaranteed to lie bellow a required threshold with a
probability that asymptotically goes to one.
Lossless image compression techniques based on a predictive coding approach
process image pixels in some fixed and pre-determined order, modeling the intensity
34 Chapter 3

of each pixel as dependent on the intensity values at a fixed and pre-determined
neighborhood of previously visited pixels. Thus, such techniques form predictions
and model the prediction-error based solely on local information. This property
provides an attractive framework for near-lossless compression, in which a
constraint is imposed on the local reconstruction error. On the other hand, the
performance limitations of these techniques stem from the very local nature of the
prediction. Thus, such techniques do not adapt well to the non-stationary nature of
the image data, and are usually incapable of capturing global patterns that
influence the intensity value of the current pixel being processed.
Multi-resolution techniques offer a convenient way to overcome highly localized
processing by separating the information into several scales, and exploiting the
predictability of insignificance of pixels from a coarse scale to a larger area at a finer
scale. In addition, these techniques provide a natural framework for generating a
fully scalable bit-stream. Despite of these attractive features, the applicability of
multiresolution transforms in near-lossless compression has been for a long time
limited and by far unexplored. This was mainly due to the difficulties met in
providing accurate translation of the spatial-domain upper bound on the pixel value
errors into a suitable criterion in the transform domain.
A first attempt of exploiting multiresolution coding techniques and combining
them with near-lossless compression has been done by Ansari et al. [Ansari 1998].
The scheme of [Ansari 1998] is organised as a two-layer coding scheme of lossy
plus near-lossless coding. A SPIHT-based [Said 1996a] fully embedded multi-
resolution coder is used to generate a lossy base layer, while a context-based lossless
coder is designed to code the difference between the original image and the lossy
reconstruction. The second predictive-based layer can be thus seen as a refinement
layer, which, when added to the base layer, produces an image that meets the
specified near-lossless tolerance. However, this approach did not solve the
fundamental problems in L-infinite coding, that is, (i) providing a clear link between
the spatial-domain L-infinite distortion and its transform-domain equivalent, and (ii)
providing scalability in L-infinite sense.
Recently, an alternative and unique approach to this problem has been proposed
for images and volumetric data by Alecu et al. in [Alecu 2005, Alecu 2006, Alecu
2001, 2003b]. In these works, a new wavelet-based L-infinite-constrained scalable
coding approach has been proposed, that generates a fully embedded L-infinite-
oriented bit-stream, while retaining the coding performance and scalability options
of state-of-the-art wavelet codecs. This is achieved by making the link between the
spatial and wavelet domain distortions, and establishing the signal-independent
expression of the smallest upper bound of the spatial-domain L-infinite distortion,

based on the distortion in the wavelet domain. For each wavelet subband, a generic
embedded deadzone scalar quantizer is assumed, and the bound is applicable for any
non-integer lifting-based wavelet transform, of any dimension.
In their works [Alecu 2006, Alecu 2001, 2003b], Alecu et al. proposed both
signal-independent and signal-dependent estimators of the spatial L-infinite
distortion. The estimators represent the smallest upper bound of the distortion
obtained at various truncation points of the embedded bit-stream, corresponding to
the end of every subband bit-plane. The subband quantization bins are progressively
encoded, in a subband by subband and bit-plane by bit-plane manner. While the
signal-independent L-infinite estimator has low computational complexity, in
practice it over-estimates the obtained L-infinite distortion. Therefore, a more
accurate signal-dependent estimator has been proposed in [Alecu 2006], based on
the statistical properties of the highest-rate spatial-domain errors independently
generated in the wavelet subbands. This estimator derives a priori, i.e. before actual
decoding is performed, the expected distortion obtained in any given truncation
point corresponding to the end of a subband bit-plane. Moreover, given the
subbands statistics, the estimator represents the smallest upper bound of the
distortion, guaranteeing that the obtained distortion will be smaller than or at most
equal to the estimated value.
Following a similar approach as in [Alecu 2006, Alecu 2001], we have developed
and validated the novel concept of local error control in mesh geometry encoding. In
contrast to traditional mesh coding systems that use the mean-square error as the
target distortion metric, in this chapter we propose a new L-infinite mesh coding
approach, for which the target distortion metric is the L-infinite distortion. In this
context, a novel wavelet-based L-infinite-constrained coding approach for meshes is
investigated, which ensures that the maximum error between the original and
decoded meshes is lower than a given upper-bound. In addition, the L-infinite
estimator is formulated for a generic family of embedded deadzone uniform scalar
quantizers. Furthermore, the proposed system achieves scalability in L-infinite
sense, that is, any decoding of the input stream will correspond to a perfectly
predictable L-infinite distortion upper-bound. Moreover, the proposed approach
enables a fast real-time implementation of the rate-allocation, and it preserves all the
scalability features and animation capabilities of the employed scalable mesh codec.
In terms of structure, section 3.4 summarizes the derivations made by Alecu et al. in
[Alecu 2005, Alecu 2006, Alecu 2001, 2003b], while all the subsequent sections
detail the novel L-infinite mesh coding approach proposed in this work.

36 Chapter 3

3.4 THE SMALLEST UPPER BOUND OF THE L-
INFINITE DISTORTION IN LIFTING BASED
WAVELET TRANSFORMS
In this section, we detail the theoretical derivations of the smallest upper bound of
the L-infinite distortion in lifting-based wavelet transforms. These derivations have
been proposed by Alecu et al. in [Alecu 2005, Alecu 2006, Alecu 2001, 2003b]. We
find it important to detail them here, for the sake of completeness and in order to
ensure a good understanding of our L-infinite mesh coding approach. For simplicity
in the description, the derivations are expressed for 2D signals, as in this way they
are easy to be followed, and they are still general enough to be straightforwardly
extended to the 3D case needed for meshes. For a generalized formulation of
smallest upper bound of the L-infinite distortion in an n-dimensional case, we refer
to [Alecu 2005, Alecu 2006].
3.4.1 The Lifting-based Wavelet Transform
The most-common multiresolution transform implementation employed in order
to decorrelate the input signal is the lifting-based synthesis of the wavelet transform
[Sweldens 1996]. Based on a lifting-based implementation one can establish the
relation between the wavelet- and spatial-domain L-infinite distortions, as shown
next.

p
(M)
p
(1)
u
(1)
u
(M)
z

2
2
1/K
s
n
(M)
d
n
(M)
x

K

Figure 3-1: Forward 1D wavelet transform using lifting.

u
(M)
p
(M)
u
(1)
p
(1)
2
2
z
-1
s
n
(M)
1/K
d
n
(M)
x

K

Figure 3-2: Inverse 1D wavelet transform using lifting.
For a 1D signal, the classical 1D forward and inverse lifting-based wavelet
transforms are performed as illustrated in Figure 3-1 and Figure 3-2. The predictors
( )
{ ( ) : 1 }
i
p z i M = of the lifting steps, respectively updaters
( )
{ ( ) : 1 }
i
u z i M = of
the dual lifting steps, are Laurent polynomials of the form
( ) ( )
( )
i i k
k
k
p z p z
and

( ) ( )
( )
i i k
k
k
u z u z
.
For a 2D signal, the 1D forward transform is applied vertically and then
horizontally on the approximation subband at each decomposition level. The 2D
reconstruction follows a row-column order, i.e. the 1D inverse transform is firstly
applied horizontally and then vertically. The classical implementation of the inverse
2D wavelet transform using lifting is illustrated in Figure 3-3. Notice that in this
implementation, the individual contribution of each wavelet subband to the
reconstructed signal is lost when passing from the wavelet to the spatial domain, due
to the intermediate addition operations of the even and odd reconstructed signal
samples performed in each inverse 1D lifting-scheme block. To preserve the
contributions originating from the different subbands, an alternative inverse 2D
lifting-based wavelet transform implementation has been proposed in [Alecu 2005,
Alecu 2003b], as depicted in Figure 3-4. We assume that the 2D forward transform
is implemented by 1D transforms first applied on columns and then on rows, which
corresponds to the row-column order application of the 1D inverse wavelet
transforms, as illustrated in Figure 3-4.

LL

HL

6.4. K
u
(M)
LH

5.3. 1
/
K
p
(M)
u
(1)
p
(1)
HH

perform on rows perform on columns
x

2

2

z
-1
K
u
(M)
7.2. 1
/
K
p
(M)
u
(1)
p
(1)
2
z
-1
2
K
1/K
1/K
u
(M)
7.1. 1
/
K
p
(M)
u
(1)
p
(1)
2

z
-1
2

K
1/K

Figure 3-3: Classical 2D inverse wavelet transform using lifting.

u
(M)
LL

7. 1
/
K
p
(M)
u
(1)
p
(1)
HL

6. K
u
(M)
LH

5. 1
/
K
p
(M)
u
(1)
p
(1)
HH

x2n,2m
4. K
u
(M)
3. 1
/
K
p
(M)
u
(1)
p
(1)
2. K
u
(M)
1. 1
/
K
p
(M)
u
(1)
p
(1)
2

perform on rows perform on columns
x2n+1,2m
x2n,2m+1

x2n+1,2m+1

x

2

2

z
-1
2

z
-1
2

z
-1
2

z
-1
2

2

K
K
K
K
1/K
1/K
1/K
1/K

Figure 3-4: Alternative 2D inverse wavelet transform using lifting.

38 Chapter 3

For one-level wavelet decomposition, the interpolations introduced by this
scheme on respectively each of the two dimensions give rise to the four possible
types of image-domain samples, according to the parity of their indexes. Let us
denote by
H
N and
W
N the height, respectively width of the input data. For a one-
level wavelet decomposition, we obtain the series of dependencies between the
spatial domain samples
{ }
2 ,2
: , {0,1}
n a m b
x a b
+ +
e and the corresponding contributing
wavelet coefficients, where [0 ( 1) 2 ]
H
n N e (

, [0 ( 1) 2 ]
W
m N e (

, and
a (

denotes the integer part of a . These dependencies are linear and can be written
explicitly in the form [Alecu 2005, Alecu 2006, Alecu 2003b]:

( ) ( )
( ) ( )
( ) ( )
2 ,2 , , , ,
, 0,1
s high s high
a b
s low s low
a b
q p
s s
n a m b p q a b n p m q
s a b p p q q
x k W
+ + + +
= = =
=

(3.10)
where s is the index that identifies respectively the four wavelet subbands
LL,LH,HL,HH, and
( )
,
s
n p m q
W
+ +
are the wavelet coefficients belonging to subband s
that contribute to
2 ,2 n a m b
x
+ +
.
For every subband s, these coefficients define a window in the wavelet domain of
dimensions
( ) ( ) ( ) ( )
( 1) ( 1)
s high s low s high s low
a a b b
p p q q + + , located around and
including a central coefficient
( )
,
s
n m
W , the position within the window being referred
to with the aid of the indexes p,q. For every such position it is shown [Alecu 2005,
Alecu 2006, Alecu 2003b] that there exists a corresponding constant (or subband
weight)
( )
, , ,
s
p q a b
k e , derived from the predictor and updater coefficients
( ) ( )
{ , : 1 }
i i
k k
p u i M = , which multiplies the wavelet coefficient
( )
,
s
n p m q
W
+ +
. The
dependencies between image domain samples and wavelet-domain coefficients are
illustrated in Figure 3-5.
The coordinate system ( ) , m n refers to each individual subband, the origin ( ) 0, 0
being located in the lower-left corner of the subband. The small rectangle within
each such wavelet subband s delimits the set of wavelet coefficients
( ) ( )
( )
, ,
s
m n p q
w
+
, i.e.
the matrix
( )
,
s
m n
W , defined for the given subband coordinate vector ( ) , m n .
The coordinate system ( ) , p q refers to the position of a wavelet coefficient
( ) ( )
( )
, ,
s
m n p q
w
+
within the matrix
( )
,
s
m n
W . It can be noticed that the origin ( ) 0, 0 in the
( ) , p q reference system is located approximately in the centre of the internal space
of the matrix
( )
,
s
m n
W , rather than on the boundary; thus, the limits
( ) ( )
( )
,
s s
p q always
obey the conditions
( ) ( )
0, 0
s high s high
p q > > , respectively
( ) ( )
0, 0
s low s low
p q s s .


Figure 3-5: The dependencies of the samples
2 ,2 n a m b
x
+ +
, , 0,1 a b = with respect to
the wavelet coefficients
( ) s
W ( { } , , , s LL LH HL HH e ) of the four subbands for a
one-level wavelet decomposition. The reference system (n,m) refers to the subband
s , with the (0,0) coordinates located in the lower-left corner of each subband, while
the reference system (p,q) refers to the dependency window within each subband.

Figure 3-6: Representation for the (4,4) symmetrical biorthogonal transform of the
two-dimensional weight matrices
( )
1,1
s
K , for 1, , 4 s = ; the classical notation
( ) , x y of a 2D coordinate system indicate here the directions in which the indices
( ) ( )
( )
,
s s
p q increase numerically, for every subband s .

40 Chapter 3

We also define a two-dimensional weight matrix
( )
,
s
a b
K , of which the subband
weights
( )
, , ,
s
p q a b
k e are derived from the prediction and update coefficients of the
lifting-based wavelet transform [Alecu 2003b]:

( )
( ) ( )
, , , ,
s s
a b p q a b
k K (3.11)
We notice that, formally, an appropriate mapping is needed in (3.11) between the
indexing in
( )
,
s
a b
K (requesting positive indices) and the indexing in the ( ) , p q
reference system, which spans both positive and negative numbers.
We illustrate two examples of the weight matrices
( )
,
s
a b
K , for the case of the (4,4)
symmetrical biorthogonal transform [Daubechies 1998] and for 1 a b = = . Figure
3-6 depicts the two-dimensional case, with 1, , 4 s = . A similar example is
depicted in Figure 3-7 for the three-dimensional case and 1, ,8 s = .

Figure 3-7: Representation for the (4,4) symmetrical biorthogonal transform of the
three-dimensional weight matrices
( )
1,1
s
K , for 1, ,8 s = ; the classical notation
( ) , , x y z of a 3D coordinate system indicate here the directions in which the indices
( ) ( ) ( )
( )
, ,
s s s
p q r increase numerically, for every subband s .

3.4.2 The Maximum Absolute Difference (MAXAD)
This section introduces the MAXAD, which is the Maximum Absolute Difference
between the spatial-domain sample values in the original and reconstructed data

respectively. In this section we summarize the theoretical findings of [Alecu 2005,
Alecu 2006, Alecu 2003b] establishing the link between the MAXAD and the
quantization errors occurring in the wavelet-domain.
For simplicity in the description, let us consider the case of images and a one-
level two-dimensional wavelet decomposition scheme. We quantize the wavelet
coefficients by applying uniform quantization with a bin size
( ) 1
s
A on each subband s
of the first decomposition level. Our option for these quantizers is motivated by the
fact that, for Generalized-Gaussian sources, the uniform quantizer is the optimum
quantizer in MAXAD-sense, as shown by Alecu et al. in [Alecu 2003a] (see theorem
2). The wavelet coefficients are commonly modeled using Generalized Gaussian
distributions [Taubman 2002], hence, using uniform quantizers is the most
appropriate design option. However, we notice that embedded double-deadzone
quantizers (or successive approximation quantizers SAQs [Shapiro 1993]) are also
commonly used in practice in most of the scalable wavelet-based codec designs.
Despite of their sub-optimality in MAXAD sense, SAQs are attractive from an
implementation point of view, as they are closely related to bit-plane coding. A
general formulation of the MAXAD for an arbitrary deadzone size is deferred to
section 3.5.4, which will include the (embedded) uniform quantizers and SAQ as
particular cases. In the following, we focus on fixed-rate uniform quantizers and
formulate the MAXAD in this case. The remainder of this section follows closely
the notations and derivations from [Alecu 2003b].
Denote by
( )
,
s
n p m q
W
+ +
the reconstructed coefficients corresponding to
( )
,
s
n p m q
W
+ +
.
Using relation (3.10), the reconstructed image pixels
2 ,2 n a m b
x
+ +
can be written as:

( ) ( )
( ) ( )
( ) ( )
2 ,2 , , , ,
, 0,1
s high s high
a b
s low s low
a b
q p
s s
n a m b p q a b n p m q
s a b p p q q
x k W
+ + + +
= = =
=

(3.12)
Uniform quantization produces the errors
( ) ( ) 1 1 ( ) ( )
, ,
2, 2
s s
i j i j s s
W W
(
e A A

in each
subband s in the wavelet domain, where ( , ) i j are the coordinates within the
reference system of each wavelet subband. After image reconstruction, the absolute
errors in the image domain are written as
, , u v u v
x x , where ( , ) u v are now the
coordinates in the image domain reference system. The MAXAD, which we will
denote throughout this chapter as M, is given by the upper bound of the image
domain absolute errors, that is
{ }
, ,
,
sup
u v u v
u v
M x x = . Relations (3.10) and (3.12)
refer to all image domain samples
, u v
x , respectively reconstructed samples
, u v
x ,
which are divided into four distinct types according to the parity of the indices u and
v. Let 2 , 2 u n a v m b = + = + , with , 0,1 a b = depending of the parity of u and v. We
can write:
42 Chapter 3

{ }
( )
( )
( ) ( )
( ) ( )
( ) ( )
( ) ( )
2 ,2 2 ,2
, 0,1
,
1
( )
, , ,
, 0,1
1
( )
, , ,
, 0,1
max sup
max
2
max
2
s high s high
a b
s low s low
a b
s high s hig
a b
s low s low
a b
n a m b n a m b
a b
n m
q p
s s
p q a b
a b
s
p p q q
q p
s s
p q a b
a b
p p q q
M x x
k
k
+ + + +
=
=
= =
=
= =

= =
`
)

A
= =
`

)
A
=

h
s

`

)

(3.13)
We point out that:

( )
( )
( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( ) ( )
, , , , ,
,
1
( )
, , ,
sup
2
s high s high
a b
s low s low
a b
s high s high
a b
s low s low
a b
q p
s s s
p q a b n p m q n p m q
n m
p p q q
q p
s s
p q a b
p p q q
k W W
k
+ + + +
= =
= =

=
`

)
A
=

(3.14)
expresses the fact that, in the worst case scenario, the upper or the lower bounds of
the errors
( )
( ) ( )
, ,
s s
n p m q n p m q
W W
+ + + +
and the corresponding subband weights
( )
, , ,
s
p q a b
k
have the same sign.
For every subband s and combination { , } a b we denote
( ) ( )
( ) ( )
( ) ( )
, , , ,
s high s high
a b
s low s low
a b
q p
s s
a b p q a b
p p q q
K k
= =
=

. Equation (3.13) becomes:

( ) 1
( )
,
, 0,1
max
2
s s
a b
a b
s
M K
=

A
=
`

)
(3.15)
Let us denote by { , } a b ' ' the combination { , } a b that gives the maximum in this
relation, for any given values of
( ) 1
s
A . As shown in section 3.4.3, for each of the
considered wavelet transform instantiations, there exists such a combination { , } a b ' '
providing the maximum M in (3.15) for any
( ) 1
s
A . Expression (3.15) can then be
expanded as:

( ) ( ) ( ) ( ) 1 1 1 1
( ) ( ) ( ) ( )
, , , ,
2 2 2 2
LL LH HL HH LL LH HL HH
a b a b a b a b
M K K K K
' ' ' ' ' ' ' '
A A A A
= + + + (3.16)
In other words, for one decomposition-level, the total MAXAD is the weighted sum
of maximum quantization errors occurring in the four wavelet subbands. According
to (3.16), the contribution of the lowest frequency subband to the total MAXAD is
( )
(1) ( )
,
2
LL
LL a b
K
' '
A , and
(1)
2
LL
A represents the upper bound of the error in this subband.
By applying again the wavelet transform to the LL subband of the first
decomposition level, the maximum absolute error
(1)
2
LL
A in this subband can be
written in terms of the MAXAD contributions of the four additional subbands of the
second decomposition-level. Similar to (3.16) we can write:

(1) (2) (2) (2) (2)
( ) ( ) ( ) ( )
, , , ,
2 2 2 2 2
LL LH HL HH LL LL LH HL HH
a b a b a b a b
K K K K
' ' ' ' ' ' ' '
A A A A A
= + + + (3.17)

By replacing (3.17) into expression (3.16), we obtain the expression of the image
domain MAXAD for two levels of decomposition:

( )
( ) ( ) ( )
(2)
2
( )
,
( ) ( ) ( )
2
1 1 1
( ) ( ) ( ) ( ) ( ) ( )
, , , , , ,
1
2
2 2 2
LL LL
a b
l l l
l l l
LL LH LL HL LL HH LH HL HH
a b a b a b a b a b a b
l
K
K K K K K K
' '

' ' ' ' ' ' ' ' ' ' ' '
=
A
+
A A A
+ + +
`

)

(3.18)
By iteratively repeating the decomposition process for an L -level lifting-based
wavelet decomposition scheme, the maximum absolute difference for a 2D wavelet
transform is given by:

( )
( ) ( ) ( )
( )
( )
,
( ) ( ) ( )
1 1 1
( ) ( ) ( ) ( ) ( ) ( )
, , , , , ,
1
2
2 2 2
L
L
LL LL
a b
l l l
L
l l l
LL LH LL HL LL HH LH HL HH
a b a b a b a b a b a b
l
M K
K K K K K K
' '

' ' ' ' ' ' ' ' ' ' ' '
=
A
= +
A A A
+ + +
`

)
,
(3.19)
where
( ) i
s
A is the bin-size of the uniform quantizer applied on the subbands s of
decomposition level , 1 l l L s s , and L is the total number of decomposition levels.
In the mesh-coding context, the L-infinite distortion corresponds to the maximum
absolute difference (MAXAD) between the actual vertex positions and the decoded
positions in the reconstructed mesh. The MAXAD estimators in case of meshes and
the various approaches that could be followed in the design of an L-infinite mesh
coding approach are investigated in section 3.5.
3.4.3 MAXAD Examples
We exemplify the concepts introduced up to this point with a selection of the
most popular wavelet transforms included in the JPEG-2000 standard [Taubman
2002]. The first two transforms taken into consideration are instances of a family of
symmetric, biorthogonal wavelet transforms built from the interpolating Deslauriers-
Dubuc scaling functions, namely the (2,2) interpolating transform which has 5 and 3
taps for the analysis and synthesis filters respectively. We denote it as the 5.3
transform. The (4,2) interpolating transform has 9 and 7 taps for the analysis and
synthesis filters respectively, which we will denote as the 4.2 transform. The last
transform is the (4,4) symmetrical biorthogonal transform, which has 9 taps for the
analysis filter and 7 taps for the synthesis filter, which we will denote as the 9.7
transform.

44 Chapter 3

3.4.3.1 5.3 Transform
The 1D forward 5.3 transform can be factorized via lifting as follows:

( )
( )
(0)
2
(0)
2 1
(0) (0) (0)
1
(0)
1
n n
n n
n n n n
n n n n
s x
d x
d d s s
s s d d
o
|
+
+
=
=
= + +
= + +
, (3.20)
where the constants , o | are given by 0.5 o = and 0.25 | = respectively. The
corresponding 1D inverse 5.3 transform is given by:

( )
( )
(0)
1
(0) (0) (0)
1
(0)
2 1
(0)
2
n n n n
n n n n
n n
n n
s s d d
d d s s
x d
x s
|
o

+
+
= +
= +
=
=
. (3.21)
For a one-level wavelet decomposition scheme, by applying an inverse 2D
transform (see Figure 3-4), one obtains dependencies of the form (3.10). These
dependencies are depicted in Figure 3-8(a) for all of the four possible spatial-domain
cases. The weights
( )
, , ,
s
p q a b
k corresponding to each subband s are given in Figure
3-8(b). The upper bound of the spatial domain absolute error is formulated in (3.22)
for each of the four parity cases.

(1) (1) (1) (1)
2
2 ,2 2 ,2
,
(1) (1) (1) (1)
2 ,2 1 2 ,2 1
,
(1) (1) (1) (1)
2 1,2 2 1,2
,
2 1,2
,
sup 2 2 4
2 2 2 2
sup 2 4 2
2 2 2 2
sup 2 4 2
2 2 2 2
sup
LL LH HL HH
n m n m
n m
LL LH HL HH
n m n m
n m
LL LH HL HH
n m n m
n m
n
n m
x x
x x
x x
x
| | |
o o| |
o o| |
+ +
+ +
+
A A A A
= + + +
A A A A
= + +
A A A A
= + +
(1) (1) (1) (1)
2
1 2 1,2 1
4 2 2
2 2 2 2
LL LH HL HH
m n m
x o o o
+ + +
A A A A
= +
(3.22)
By replacing the values of , o | in these equations, one obtains that:

4
( )
, 2 1,2 1 2 1,2 1
, 0,1
,
1
max sup
2
s s
a b n m n m
a b
n m
s
M K x x
+ + + +
=
=
A
= =
`
)
. (3.23)
Generalizing to L decomposition-levels one obtains the MAXAD, given by:

( ) ( ) ( ) ( )
1
2 2 2 2
L l l l
L
LL LH HL HH
l
M
=
| | A A A A
= + + +
|
\ .
(3.24)

p
q
2 ,2 n m
x
2 ,2 1 n m
x
+

2 1,2 n m
x
+

2 1,2 1 n m
x
+ +

q q
q
p p
p
LL
LH HH
HL

(a)

x
2n,2m

LL LH HL HH
1.0 -.25 -.25 .0625
x
2n,2m+1

LL LH HL HH
.5 -.125
-.125
.75
-.1875
.03125
x
2n+1,2m

LL LH HL HH
.5
-.125
.75
-.125
-.1875
.03125
x
2n+1,2m+1

LL LH HL HH
.25
-.0625
.375
-.0625
.375
-.09375
.015625
.5625
(b)
Figure 3-8: The 5.3 transform: (a) 2D dependencies between spatial domain pixels
and wavelet coefficients for the four possible spatial-domain cases, and (b) subband
weights
( )
, , ,
s
p q a b
k .

46 Chapter 3

The 1D forward 4.2 transform can be factorized via lifting as follows:

( ) ( )
( )
(0)
2
(0)
2 1
(0) (0) (0) (0) (0)
1 1 2
(0)
1
n n
n n
n n n n n n
n n n n
s x
d x
d d s s s s
s s d d
o |
+
+ +
=
=
= + + + +
= + +
, (3.25)
where the constants , , o | are defined as 0.5625 o = , 0.0625 | = and 0.25 = .
The corresponding 1D inverse 4.2 transform is given by:

( )
( ) ( )
(0)
1
(0) (0) (0) (0) (0)
1 1 2
(0)
2 1
(0)
2
n n n n
n n n n n n
n n
n n
s s d d
d d s s s s
x d
x s
o |
+ +
+
= +
= + +
=
=
. (3.26)
The spatial-to-wavelet domain dependencies for this transform are shown in
Figure 3-9, and the subband weights
( )
, , ,
s
p q a b
k are given in
Figure 3-10. Spatial domain MAXADs for the four possible spatial domain cases
are given in equations (3.27).

( ) ( )
( ) ( )
(1) (1) (1) (1)
2
2 ,2 2 ,2
,
(1) (1) (1) (1)
2 ,2 1 2 ,2 1
,
(1) (1) (1) (1)
2 1,2 2 1,2
,
,
sup 2 2 4
2 2 2 2
sup 2 4 2
2 2 2 2
sup 2 4 2
2 2 2 2
sup
LL LH HL HH
n m n m
n m
LL LH HL HH
n m n m
n m
LL LH HL HH
n m n m
n m
n m
x x
x x
x x
x

| o | o
| o | o
+ +
+ +
A A A A
= + + +
A A A A
= + + +
A A A A
= + + +
( ) ( ) ( )
(1) (1) (1) (1)
2
2 1,2 1 2 1,2 1
4 2 2
2 2 2 2
LL LH HL HH
n m n m
x | o | o | o
+ + + +
A A A A
= + + +

(3.27)
By replacing the values of , , o | , one obtains a similar result as for the 5.3
transform, namely
2 1,2 1 2 1,2 1
,
sup
n m n m
n m
M x x
+ + + +
= . The generalization to L
decomposition-levels yields the following expression of the MAXAD:
( )
( ) ( ) ( )
( )
2
2
( ) ( ) ( )
2 1 2 1 2 2
2 1 2 1 2 2
1
2
2
2 2 2
2 2 2
L
L
L LL
l l l
L
l l l
l l l LH HL HH
l
M | o
| o | o | o

=
A
= +
| | A A A
+ + +
|
\ .

(3.28)

LL
LH HH
HL
p
q
2 ,2 n m
x
2 ,2 1 n m
x
+

2 1,2 n m
x
+

2 1,2 1 n m
x
+ +

q
q q
p
p p

Figure 3-9: The 4.2 transform: 2D dependencies between spatial domain pixels and
wavelet coefficients for the four possible spatial-domain cases.

Figure 3-10: The 4.2 transform: subband weights
( )
, , ,
s
p q a b
k .
x
2n,2m

LL LH HL HH
1.0 -.25 -.25 .0625
x
2n,2m+1

LL LH HL HH
-.0625
.5625
-.140625
.015625
-.125
.015625
.71875
-.179688
-.00390625
.03125
x
2n+1,2m

LL LH HL HH
-.0625
.5625
-.125
.015625
.71875
-.140625
.015625
-.179688
-.00390625
.03125
x
2n+1,2m+1

LL LH HL HH
-.0351563
.00390625
.316406
-.000976563
-.0703125
-.0449219
.0078125
.00878906
.404297
-.000976563
-.0703125
-.0449219
.0078125
.00878906
.404297
-.0898438
-.00195313
.000244141
.015625
.0112305
.516602
48 Chapter 3

The 1D forward 9.7 transform can be factorized via lifting as:

( )
( )
( )
( )
(0)
2
(0)
2 1
(1) (0) (0) (0)
1
(1) (0) (1) (1)
1
(2) (1) (1) (1)
1
(2) (1) (2) (2)
1
(2)
(2)
n n
n n
n n n n
n n n n
n n n n
n n n n
n n
n
n
s x
d x
d d s s
s s d d
d d s s
s s d d
s s
d
d
o
|
o
,
,
+
+
=
=
= + +
= + +
= + +
= + +
=
=
, (3.29)
where the constants , , , , o | o , are given by:

1.586134342
0.05298011854
0.8829110762
0.4435068522
1.149604298
o
|
o
,
~
~
~
~
~
.
The corresponding 1D inverse 9.7 transform is given by:

( )
( )
( )
( )
(2)
(2)
(1) (2) (2) (2)
1
(1) (2) (1) (1)
1
(0) (1) (1) (1)
1
(0) (1) (0) (0)
1
(0)
2 1
(0)
2
n
n
n n
n n n n
n n n n
n n n n
n n n n
n n
n n
s
s
d d
s s d d
d d s s
s s d d
d d s s
x d
x s
,
,
o
|
o
+
+
=
=
= +
= +
= +
= +
=
=
, (3.30)
The spatial-to-wavelet domain dependencies are shown in Figure 3-11, the
subband weights
( )
, , ,
s
p q a b
k being illustrated in Figure 3-12. Spatial domain MAXADs
are given in (3.31), and, similar to the previous two transforms, we obtain again that
2 1,2 1 2 1,2 1
,
sup
n m n m
n m
M x x
+ + + +
= .

( ) ( )
( )
( ) ( ) ( )
2 ,2 2 ,2
2
,
( )
2
2
1
sup 2 2 2 2
2 2 2
4 2
2
LL LH HL
n m n m
n m
HH
x x o| | o o| | o
,
, o| | o
A A A
= + + + + + + +
A
+ + +

( )
( )( )
{ }
( )
( )
2 ,2 1 2 ,2 1
2
,
( )
2 2
( ) ( )
2
2
sup 2
2
4 2 2 6
2
2
2 2
LL
n m n m
n m
LH
HL HH
x x o o|
,
| o |o o o| o| o
| o |o
+ +
( A
= + + +
(

A
( + + + + + + +

A A
( + + + +

(3.31)

( )
( )( )
{ }
( )
( ) ( )
2 1,2 2 1,2
2
,
( )
2 2
( )
2
2
sup 2
2 2
4 2 2 6
2
2
2
LL LH
n m n m
n m
HL
HH
x x o o|
,
| o |o o o| o| o
| o |o
+ +
( A A
= + + + +
(

A
( + + + + + + +

A
( + + +

( )
( ) { }
( ) { }
2
( ) ( )
2 1,2 1 2 1,2 1
2
,
( ) ( )
2
4 2 1
sup 2 2 1
2 2
2 2 1
2 2
LL LH
n m n m
n m
HL HH
x x
o |
o |
,
o | ,
+ + + +
+ + (
A A

= + + + + (

A A
+ + + + (

The L-level generalization for this transform gives the following MAXAD
expression:

( ) ( ) ( )
( ) 2
2
1 1 1
2 2 2
2
2 2 2
1
2
2 2 2
l l l
L
L
LL
l l l
L
LH HL HH
l
k
M
k k k
k k
,
,
, , ,

=
| | A
= +
|
\ .

| | | | | | A A A
+ + +
| | | `
\ . \ . \ .
)
, (3.32)
where ( ) 2 2 1 k | o = + + (

.
50 Chapter 3

p
q
2 ,2 n m
x
2 ,2 1 n m
x
+

2 1,2 n m
x
+

2 1,2 1 n m
x
+ +

q
q q
p
p p
LL
LH HH
HL

Figure 3-11: The 9.7 transform: 2D dependencies between spatial domain pixels and
wavelet coefficients for the four possible spatial-domain cases.


Figure 3-12: The 9.7 transform: subband weights
( )
, , ,
s
p q a b
k .

3.5 SCALABLE L-INFINITE CODING OF MESHES
3.5.1 Scalable Mesh Coding Techniques
Today, a large number of applications make use of digital 3D graphics, in various
domains such as entertainment, architecture, design, education and medicine.
Furthermore, the diversification of content and the increasing demand in mobility
has led to a proliferation of heterogeneous terminals, such as high-end graphic
workstations, portable computers, game consoles, high-resolution TV sets, or low-
power mobile devices. Optimally addressing 3D graphics applications in this context
requires a scalable coding system, in order to efficiently store, transmit and display
3D graphics content on a wide variety of end-user terminals, featuring different
requirements in terms of resolution and quality and having different computational
capabilities. Such a 3D graphics coding system should (i) generate a scalable bit-
stream that could be progressively transmitted and decoded, possibly going up to a
x
2n,2m

LL LH HL HH
-.032083
.00165563
.62171
-.297577
-.000970421
.0153563
.018805
-.297577
-.000970421
.0153563
.018805
-.00900085
.000568797
.142433
x
2n,2m+1

LL LH HL HH
-.050888
-.0170119
.00262605
.32966
-.157789
-.00153922
.00997128
.0243572
-.0872258
-.0346958
-.00153922
.00450124
.0298272
.672341
-.321811
-.0142766
-.00263833
.000902188
.0203364
.04175
x
2n+1,2m

LL LH HL HH
-.050888
-.0170119
.00262605
.32966
-.0872258
-.0346958
-.00153922
.00450124
.0298272
.672341
-.157789
-.00153922
.00997128
.0243572
-.321811
-.0142766
-.00263833
.000902188
.0203364
.04175
x
2n+1,2m+1

LL LH HL HH
-.0269832
.00416527
.174801
-.00244141
-.0462512
-.0550322
.00713958
.0158158
.356507
-.00244141
-.0462512
-.0550322
.00713958
.0158158
.356507
-.0943293
-.00418475
.00143099
.0122378
.0322563
.727095
52 Chapter 3

lossless reconstruction of the input mesh, (ii) allow a minimum distortion at any bit-
rate, or (iii) guarantee a target distortion at a minimum bit-rate, according to the end-
user requirements.
A broad range of scalable mesh coding techniques has been proposed in the
literature in order to meet these complex requirements. These include connectivity-
driven approaches that progressively encode a 3D mesh by gradually simplifying it
to a base mesh, having a much smaller number of vertices than the original one.
Progressive mesh coders compress the base mesh and the series of reversed
simplification operations. Among such codecs, one enumerates the mesh coding
approach introduced by Hoppe in [Hoppe 1996], the Progressive Forest Split (PFS)
approach of Taubin et al. [Taubin 1998a], the Compressed Progressive Mesh (CPM)
of Pajarola and Rossignac proposed in [Pajarola 2000], and the Valence-Driven
Conquest (VD) technique of Alliez and Desbrun [Alliez 2001]. Li and Kuo
introduced the concept of embedded coding in order to encode the mesh
connectivity and geometry in a progressive and inter-dependent manner [Li 1998a].
One of the most representative codecs from this category is 3DMC [Pereira 2002,
Walsh 2002] which has been standardized by MPEG-4 AFX [ISO/IEC 2004] as a
high-level tool for scalable encoding of 3D models. 3DMC employs two algorithms,
i.e. the Topological Surgery (TS) method [Taubin 1998b] to encode the base mesh,
and the PFS of [Taubin 1998a] to refine the mesh. 3DMC losslessly encodes the
connectivity information of the base mesh and its refinement description, and
employs a lossy predictive-based scheme to encode the vertex coordinates, in which
the prediction errors are quantized and then arithmetic coded.
A second major class in scalable mesh coding includes geometry-driven
compression techniques. One can enumerate among them spatial-domain
approaches, such as the Kd-tree decomposition of Gandoin and Devillers [Gandoin
2002], who proposed a mesh coding strategy where connectivity coding is guided by
geometry coding, and the progressive mesh codec of Peng and Kuo [Peng 2005b],
based on octree decompositions of the input mesh. A second major category within
the class of geometry-driven compression techniques is given by transform-based
approaches. These include spectral coding techniques, as proposed by Karni and
Gotsman in [Karni 2000], and wavelet-based approaches, investigated by Lounsbery
et al. [Lounsbery 1997], Khodakovsky et al. [Khodakovsky 2000], [Khodakovsky
2002], and Salomie et. al [Salomie 2004b]. With this respect, MPEG-4 AFX
[ISO/IEC 2004] has recently standardized two wavelet-based scalable mesh coding
systems, including the Wavelet Subdivision Surfaces (WSS) approach of which the
fundamental ideas were set in [Khodakovsky 2000, Lounsbery 1997], and our
MESHGRID surface representation method proposed by Salomie et al. in [Salomie

2005, Salomie 2004b]. For a detailed survey of both non-scalable and scalable mesh
coding systems, the reader is referred to comprehensive papers in the literature
[Alliez 2003, Gotsman 2002, Peng 2005a].
A critical component in any coding system is rate-allocation, minimizing the
distortion for a given rate budget, or alternatively, minimizing the rate for a specific
distortion-bound. In this context, the ideal distortion metric in lossy coding of
meshes is the Hausdorff distance, as such metric guarantees a maximum local error
between the original and decoded meshes. However, calculating the Hausdorff
distance is computationally expensive, as it requires considerable processing power
and memory space [Al-Regib 2005b]. Even more, in order to solve the rate-
distortion optimization problem in a scalable system, one needs to estimate the
Hausdorff distance for all possible decodings (i.e. at every spatial resolution and
quality level) of the encoded object(s), which is computationally prohibitive. As a
consequence, the traditional approach in the literature (e.g. [Al-Regib 2005b,
Garland 1997, Kompatsiaris 2001, Morn 2004, Park 2006, Payan 2006, Tian
2007a, Tian 2007b, Zhidong 2001]) is to replace the Hausdorff distance by the L-2
distortion, i.e. the mean-square-error (MSE), and to optimize the rate allocation for
this metric. For instance, the L-2 distortion depends on the number of bits used to
quantize the prediction errors in 3DMC [Pereira 2002, Walsh 2002], the refinement
details in WSS [Lounsbery 1997], or the wavelet coefficients in MESHGRID [Salomie
2004b]. The most common L-2 distortion metric is the quadric error metric [Garland
1997], and fast approaches used to compute it in practice include the METRO tool
[Cignoni 1998] or the MESH tool [Aspert 2002]. It is important to observe that these
papers express only the final compression results in terms of Hausdorff distance,
which is used as a common metric for comparing mesh-coding approaches, and not
as a target distortion metric in a rate-distortion optimized mesh coding system.
In the following, we propose an alternative approach, wherein the Hausdorff
distance is replaced by the L-infinite distortion, corresponding to the MAXimum
Absolute Difference (MAXAD) between the actual and decoded vertex positions.
The main rationale behind our approach is that a MAXAD-driven codec performs
local error control. Furthermore, the Hausdorff distance between the original and
reconstructed meshes at a given resolution is upper-bounded by the MAXAD, as we
will show in section 3.6. This indicates that optimizing the rate allocation for a given
MAXAD constraint is equivalent to optimizing the rate allocation such that the
Hausdorff distance is upper bounded by the MAXAD constraint. In other words, L-
infinite-constrained coding of meshes actually implies Hausdorff-distance-
constrained coding of meshes. The latter is highly desirable in mesh coding, but has
been never realized in practice so far, due to the un-availability of real-time
54 Chapter 3

algorithms to estimate the Hausdorff distance.
Apart of performing L-infinite coding, providing all forms of scalability is equally
important. In this context, of particular importance is to provide scalability in L-
infinite sense, which corresponds to the capability of (i) truncating the compressed
bit-stream at a number of points and (ii) computing accurate estimates of the actual
L-infinite distortion (i.e. maximum local error) at each truncation point, without
performing an actual decoding of the mesh.
3.5.2 Distortion Formulation
Let us consider the generic situation wherein the scalable coding system
decomposes the input 3D object into L different sources of information, each of
these sources being progressively encoded. The sources can be independent regions
of interest, if we consider a spatial partitioning of the input mesh, such as [Park
2006, Tian 2007a, Zhidong 2001], or wavelet subbands, if we consider a wavelet-
based coding approach, such as MESHGRID [Salomie 2004b] and WSS
[Khodakovsky 2000, Lounsbery 1997]. In a lossy compression scenario, the problem
that needs to be solved is to determine the layers of information to be coded for each
source, such that the estimated distortion at the decoder side is minimized subject to
a constraint on the total source rate. One can also pose the alternative problem
wherein we seek to optimize the rate allocation such that the total rate is minimized
subject to a bound on the estimated distortion at the decoder side.
Let us denote by
tot
D the spatial-domain distortion in the reconstructed mesh, and
by
l
D the contribution in the total distortion of a given source , 1 l l L s s . In
general, for additive distortion metrics, the spatial-domain distortion
tot
D can be
expressed as a linear combination of distortions ( )
l l
D R on source l , of the form:

1
( )
L
tot l l l
l
D q D R
=
=
, (3.33)
where
l
R is the rate associated with source l , and the
l
q s are weighting the
different distortion contributions in the total distortion.
This additive distortion model is generic. Indeed, in spatial-partitioning mesh-
coding approaches, the sources are regions of interest (or mesh segments), hence
1,
l
q l = , and the sources are independent. In wavelet-domain approaches, such as
MESHGRID, each source is a wavelet subband which is progressively encoded in a
bitplane-by-bitplane manner, ( )
l l
D R is the source distortion-rate associated with
subband l , and the weights
l
q depend only on the distortion-metric type and
wavelet filter-bank employed.
Concerning the distortion
tot
D , two metrics have been considered in our approach
in order to instantiate (3.33). In addition to the squared distortion (or L-2 distortion)

see section 3.2.1, which is the additive metric classically used in mesh coding, we
have also considered
tot
D as being the L-infinite distortion see section 3.2.2, i.e.
the MAXAD. In wavelet-domain approaches, the smallest upper-bound of the
MAXAD follows the additive distortion model of (3.33), even if the L-infinite
metric is in principle not additive [Alecu 2006]. This will be detailed hereafter.
Additionally, we will also propose two fast methods for estimating the L-infinite
distortion without performing an actual encoding, decoding, and reconstruction of
the mesh model.
In mesh coding, each
l
D in expression (3.33) has the form given by (3.2) in the
case of an L-2 distortion metric. In wavelet-based coding approaches, the weights
l
q in (3.33) depend on the gains of the wavelet filters. For orthonormal wavelet
filters, 1
l
q = , as the transform is unitary. However, for biorthogonal wavelet filters,
one needs to account for the gains of the wavelet filters, and to re-scale the
coefficients accordingly, such that the resulting wavelet transform is
(approximately) unitary. The weights
l
q can then be determined by calculating the
L-2 norm of the low- and band-pass wavelet filters. For images, weighting factors
that are powers of two have been proposed in [Said 1996b]. For volumetric data,
factors of the form
( )
{ } 2 2 , , 1, 1, 0,1
b
a
a a b e > e , yielding a unitary 3D
wavelet transform have been employed in [Schelkens 2003]. In case of video, we
gave an example of the weighting factors
l
q in [Verdicchio 2006].
Similar to these previous approaches, it can be easily derived that for MESHGRID,
the
l
q factors depend also on the gains of the wavelet filters used in the wavelet
decomposition of the reference-grid. For the particular wavelet-transform
synthesized by (2.2) and (2.3), one can derive the total L-2 distortion in three-
dimensions, as explained next.
For simplicity, consider first the one-dimensional one-level inverse wavelet
transform given by (2.3) and let ,
n n
l h be the quantization noise in the low-pass (L)
and band-pass (H) wavelet subbands respectively. Under high-rate assumptions, the
quantization noise is zero-mean, stationary and white, with uniform probability
density, and furthermore, not correlated with the input signal. Hence, the
quantization errors are uncorrelated, i.e. | | | | | | 0
n n k m m k n m
E l l E h h E l h
+ +
= = = , for
any , n m and 0 k = , where E is the expectation operator. From (2.3), it follows
that:

( ) ( )
2
2 1 1 1 2
9 1
16 16
n n
n n n n n n
l
h l l l l
c
c
+ + +
=
= + + +
. (3.34)
where
2 2 1
,
n n
c c
+
are the spatial-domain errors on the even and odd samples
respectively. Hence:
56 Chapter 3

2 2
2
2 2
2 2 2 2
2 1
9 1
2 2
16 16
n n
n n n n
E E l
E E h E l E l
c
c
+
( ( =

| | | |
( ( ( ( = + +
| |

\ . \ .
. (3.35)
The energy of the quantization noise (i.e. the L-2 distortion) in the spatial domain is
given by:

2 2 2 2
2 2 1
1 1 1 105
2 2 2 128
S n n n n
E E E E h E l c c
+
( ( ( ( = + = +

. (3.36)
The terms
2
n
E l (

,
2
n
E h (

are the L-2 distortions in the one-dimensional low-pass
and band-pass subbands respectively. Also, we notice that the multiplication factors
in (3.36) depend on the L-2 norms of the low-pass and band-pass filters, which are
respectively
2
1
L
G = and
2 2 2
1 2(9 16) 2(1 16) 105 64
H
G = + + = . Expression (3.36)
gives the multiplication factors for a low-pass and band-pass component in one
dimension and for one decomposition level. Similar to above, in three dimensions,
and for one decomposition level, the L-2 distortion is derived as:
( ) ( )
3
(1)
2
(1) (1) (1) (1) (1) (1) (1)
105
128
1 105 105
8 64 64
tot LLL
HHH HHL HLH LHH HLL LHL LLH
D D
D D D D D D D
| |
= +
|
\ .
(
| |
+ + + + + + + (
|
\ .
(

(3.37)
where
(1)
s
D is the L-2 distortion on subband s of the decomposition level 1 ; in the
naming of the subbands in (3.37), L and H indicate a low-pass and band-pass filtering
respectively performed in a specific direction. Proceeding recursively for an arbitrary
number of decomposition levels leads to:
( ) ( )
3
( )
3( 1) 2
( ) ( ) ( ) ( ) ( ) ( ) ( )
1
105
128
1 105 105 105
8 128 64 64
J
J
tot LLL
j
J
j j j j j j j
j
D D
D D D D D D D
=
| |
= +
|
\ .
(
| | | |
+ + + + + + + (
| |
\ . \ .
(
(3.38)
where J is the number of decomposition levels.
Although in general, the squared error distortion (or L-2 distortion) is regarded as
an useful indicator of perceptual quality, its characteristic of reflecting the global
error represents a major drawback in the case of meshes, where a strict control of the
local error is essential. In this context, performing local-error control by following
an L-infinite mesh coding approach is of critical importance. Similarly, in some
applications such as 3D CAD, 3D topography, or in the medical area, the physical
characteristics of the objects of interest, such as volume, shape, topographic heights,
etc. can be measured based on their mesh geometry. Altering the geometry via
compression affects these physical characteristics. Hence, controlling the local error

on the mesh geometry by following an L-infinite mesh coding approach is again
particularly important. Other applications include geometry-based watermarking of
3D models, such as [Benedens 1999, Bors 2006], requiring a very tight control on
the local error resulting from embedding the watermark in the geometry of the 3D
model. Following an L-infinite mesh coding approach offers the possibility to
control the geometric perturbations generated by the watermark embedding process,
and opens the door for applications that simultaneously provide compression and
watermarking of 3D models.
In practice, several possible approaches could be followed in the design of an L-
infinite mesh coding approach. These are investigated next.
3.5.3 Scalable L-infinite Coding Systems
Performing L-infinite coding of meshes and simultaneously providing scalability
and a fine granularity of the output stream is a challenging task that requires a
careful selection of the coding system based on which the design is made. With this
respect, there are three major classes of coding systems from which the design of a
scalable L-infinite codec can start. These include (a) scalable spatial-domain mesh
coding approaches, such as [Gandoin 2002, Li 1998a, Park 2006, Peng 2005b, Tian
2007a, Zhidong 2001], and wavelet-based coding approaches, divided into (b) inter-
band coding systems, of which the most know representative technique is WSS
[Khodakovsky 2000, Lounsbery 1997], and (c) intra-band coding systems, of which
the most known representative approach is MESHGRID [SALOMIE 2004B].
Most of these systems share a common methodology to enable quality scalability,
and this is given by the use of scalar quantizers and layered coding. Most techniques
make use of embedded double-deadzone quantizers, known also as successive
approximation quantizers (SAQ) [Shapiro 1993, Taubman 2002], employed for
instance by the spatial-domain mesh coding approach of [Zhidong 2001], or by
MESHGRID [Salomie 2004b] and WSS [Khodakovsky 2000, Lounsbery 1997].
In general, let us consider that the codec employs a generic family of embedded
deadzone uniform scalar quantizers
{ }
,b
Q
, in which every source sample X is

quantized to [Taubman 2002]:
( )
( )
, ,
0
2 2 2 2
0
b b b b
b b
X X
sign X if
q Q X
otherwise

(
+ + >
(
= = A A

, (3.39)
where 0 A > , a (

is the integer part of a , , 0 b b B s < denotes the quantization
level, B is the total number of levels, and controls the width of the deadzone,
with ( | ,1 2 e , corresponding to a deadzone bin-size that is larger or equal to
58 Chapter 3

the other bin-sizes [Taubman 2002]. By source samples X one implies vertex
coordinates or wavelet coefficients in case of spatial-domain or wavelet-based mesh
coding respectively.
One derives from (3.39) that the width of the deadzone at quantization level
, 0 b b B s < is given by
1
(2 2 )
b
+
A , while the size of the other bins is 2
b
A . In the
particular case of SAQ, one has 0 = in (3.39), i.e the deadzone size is twice as
large as the size of the other bins. Also, if 1 2 = , the quantizer at 0 b = is
uniform, corresponding to the optimum embedded quantizer in L-infinite-sense
[Alecu 2003a]. Finally, we note that a fixed-rate deadzone uniform quantizer would
correspond to a particular case of (3.39), implying a single quantization level b and
a pre-defined A .
Assume in the following that we opt for a spatial-domain approach in our design
of a scalable L-infinite coding system, wherein the input mesh is decomposed into
adjacent regions of interest (ROIs) that are independently quantized and
progressively encoded. Setting a global target MAXAD on the entire 3D object
corresponds to setting the target MAXAD on each ROI, due to the fact that the
regions are independent. A MAXAD estimator in each ROI can be easily formed by
deriving the maximum quantization error resulting from the application of
{ }
,b
Q
at
each quantization level , 0 b b B s < . Assuming that the quantization errors are
uniformly distributed within the quantization cells, the quantization error is minimal
for mid-tread quantizers [Taubman 2002], and it is induced by the bin with the
largest width, which is the deadzone, if
1
2
b

s , or any of the non-zero bins if
1
2
b

> . Since
1 1
2 2
b

s s , , 0 b b B s < , it follows that the smallest upper
bound
tot
M of the MAXAD
tot
D for a spatial-domain mesh coding approach is
given by:

( )
1 1
2 2 2
b b
tot tot
D M

s = A, with 0 b B s < , (3.40)
It is important to observe from (3.40) that a partial decoding of a given
quantization level b will not change
tot
M . That is, in between two successive bit-
planes (i.e. for fractional bitplanes), the smallest upper-bound of the MAXAD
remains constant, which is consistent with our observations made in [Alecu 2004,
Alecu 2006]. The consequence is that an eligible truncation point in L-infinite sense
will correspond to a complete decoding of the corresponding bit-plane (or
quantization level) inside each ROI. Hence, for a spatial-domain approach, the total
number of decodable layers is given by B , which is in general a small number. For
instance, in the case of SAQ, B corresponds to the number of bits with which the
vertices are represented, which is indeed small. Furthermore, from a practical point
of view, some of the coarse quantization levels do not even make sense, because the

MAXAD will be too high for them, and the object will be much too distorted.
Expression (3.40) shows also that
tot
M is of the form
0
2 , 0
b
tot
M M b B = s < .
This indicates that, when dropping a layer,
tot
M is increasing with a factor of two,
which implies a coarse granularity in terms of MAXAD. One concludes that a
scalable spatial-domain mesh coding approach yields a limited number of
granularity levels, as it produces maximum B levels of scalability in L-infinite
sense.
Opting for an inter-band wavelet codec, such as WSS [Khodakovsky 2000,
Lounsbery 1997], will suffer from the same problem. The reason is that such codecs
exploit the inter-band statistical dependencies between the wavelet coefficients by
constructing trees (or zerotrees [Said 1996a, Shapiro 1993]) that span the entire
wavelet subbands. Consequently, for such codecs, the MAXAD can be estimated
and guaranteed only at the end of an entire wavelet bit-plane, i.e. a bitplane spanning
across all wavelet subbands. Therefore, similar to a spatial-domain approach, the
number of layers (or eligible truncation points) for an inter-band coding approach is
given by B . If SAQ is employed, B will correspond to the total number of bit-
planes in the binary representation of the wavelet coefficients, which is again a
small figure.
Following an intra-band wavelet-based coding approach, such as MESHGRID, is
then the only remaining option to significantly increase the granularity in L-infinite
sense. The reason is that an intra-band wavelet codec decomposes the input mesh in
different sources of information (subbands), which are independently quantized and
entropy coded. If J is the total number of wavelet decomposition levels, then the
number of subband bit-planes (or layers) constructed by the codec is given by
(7 1) J B + , which is much larger than B . This shows that for the same quantizers,
corresponding to a certain value of B , an intra-band wavelet codec produces
(7 1) J + more layers than a spatial-domain approach or an inter-band wavelet codec,
thus significantly increasing the granularity in L-infinite sense.
Apart from providing granularity in L-infinite sense, compression efficiency is
also an important criterion in the design of a compression algorithm. In this context,
the literature shows that intra-band wavelet-based coding provides competitive
compression performance against inter-band wavelet-based coding techniques. For
instance, our intra-band codecs published in [Munteanu 1999a, 1999b] are
competitive against inter-band codecs such as EZW [Shapiro 1993] and the state-of-
the-art SPIHT [Said 1996a]. Later designs improved the performance in intra-band
coding e.g. see the SPECK coder of W. Pearlman et al. [Pearlman 2004], the
EZBC codec of S. Hsiang and J. Woods [Hsiang 2000], or our intra-band QT-L
codec [Schelkens 2003], which systematically outperforms the 3D extension of
60 Chapter 3

SPIHT in compression of volumetric data. Finally yet importantly, JPEG-2000
[Taubman 2002], which is the state-of-the-art in still image coding, is an intra-band
wavelet codec, not an inter-band one.
From a complementary perspective, an information-theoretic analysis of the
statistical dependencies between the wavelet coefficients given in [Liu 2001] shows
that intra-band statistical dependencies are stronger than inter-band ones for
wavelet-transformed images. This indicates that intra-band models should be
favored over inter-band models in wavelet image coding. Similar to images, in [Satti
2009] we show that the intra-band statistical dependencies are stronger than the
inter-band ones for wavelet-transformed meshes. Very recent developments carried
out by colleagues in our department demonstrate that, on average, intra-band mesh
coding outperforms the state-of-the-art inter-band WSS coding approach
[Khodakovsky 2000, Lounsbery 1997] on a broad range of models and rates. All
these results indicate that opting for intra-band coding is a viable and competitive
approach in scalable coding of meshes.
One concludes that if we wish to (a) perform L-infinite coding, and (b) provide
fine-granular scalability in L-infinite sense, then we should opt for an intra-band
wavelet codec in our design. Up to date, the most representative such codec is
MESHGRID [Salomie 2004b], which motivates the choice made in our scalable L-
infinite mesh coding instantiation. L-infinite distortion estimators for a wavelet-
based intra-band mesh coding approach are proposed next.
3.5.4 L-infinite Distortion Estimators
3.5.4.1 Theoretical L-infinite Distortion Estimator
Assume in the following a scalable mesh coding technique, wherein the wavelet
subbands are quantized using generic embedded deadzone quantizers
{ }
,b
Q
, as
expressed by (3.39).
Intuitively, a quantization error produced in a certain wavelet subband will be
translated (via the inverse wavelet transform) into a corresponding contribution to
the total reconstruction error occurring in the spatial domain. Due to the linear
nature of the wavelet transform, it is possible to define a linear relation combining
the various quantization errors produced in the wavelet subbands into corresponding
errors occurring in the spatial domain. Under worst-case scenario assumptions, it is
then possible to maximize the different error-contributions from the different
wavelet subbands, and determine a smallest upper-bound of the MAXAD.
In section 3.4.2, we have followed such a theoretical approach for estimating the
MAXAD, and showed how, under worst-case scenario assumptions, the maximum
quantization errors from the different wavelet subbands are linearly combined into a

maximum spatial-domain reconstruction error. We must observe though that in
those derivations we have considered a particular quantizer instance, corresponding
to 1 2 = in (3.39), and a subband transmission scheme (or progression scheme)
assuming that (i) the bin-sizes at the finest quantization level
( )
,0
j
s
A vary across the
subbands, and (ii) the same number of bit-planes b are discarded across all wavelet
subbands. In this case, the smallest upper bound
tot
M of the MAXAD
tot
D can be
written for any N-dimensional non-integer lifting-based wavelet transform as:

1
1 1 1
1
( ) 1 ( )
,0 ,0 1
1 1
1
2 (1 )[( ) ( ) ]
2
S J
b J J j j
tot tot S S s s S b
j s
D M K K K
+
= =
s = A + A
, (3.41)
where b
+
e is the number of discarded bit-planes (i.e. quantization level) across
all subbands, J is the number of wavelet decomposition levels,
( )
,0
j
s
A is the bin-size
of the uniform quantizer at 0 b = on subband s of level , 1 , j j J s s
s
K are weight
factors derived from the predict and update lifting coefficients, and
1
S is the number
of subbands obtained for one decomposition level. Thus, (3.41) corresponds to a
progression scheme transmitting the bit-planes in a predefined manner (i.e. from the
lowest to the highest frequencies), and for which a variable subset of subbands
are quantized at level b , and the remaining ones at level 1 b + .
Predefining the order in which subband bitplanes are transmitted does not
necessarily correspond to an optimal performance in rate-distortion sense. Such a
codec can never claim optimal performance, even if practically its performance
might not be far from optimal. In general, for any intra-band wavelet codec, the
optimum number of bitplanes to be sent for each subband needs to be determined by
a rate-distortion optimization process. This implies a progression scheme that is
driven by a rate-allocation process. The progression scheme in section 3.4.2 is
predefined, hence, (3.41) is not applicable in this context. Furthermore, if one refers
to MESHGRID, its embedded quantizers are different than those assumed in section
3.4.2, and its progression scheme is generic. Therefore, (3.41) is again not
applicable. In the following, we generalize (3.41) and make it applicable to any
intra-band wavelet codec, employing any embedded quantizer instance, as given by
(3.39), and using a generic progression scheme, for which the number of discarded
bitplanes per subband is varying.
Let
, s j
b ,
, s j
be the quantization level and deadzone control parameter
respectively on subband s at decomposition level , 1 j j J s s , and
( )
,0
j
s
A be the bin-
size at the finest quantization level
,
0
s j
b = . The deadzone bin-size of the quantizer
applied on subband s at level j is
,
1
( ) ( )
, ,0
(2 2 )
s j
b
j j
s s j s
+
A = A , with
,
1 2
s j
s , while
the size of the other bins is
, ( )
,0
2
s j
b
j
s
A . Similar to (3.40), one derives that the
distortion contribution in the total MAXAD of subband s at level , 1 j j J s s is
62 Chapter 3

given by:

( ) ( )
, ,
1 1
( )
, , , ,0
2 2 2
s j s j
b b
j
s j s j s j s
b

A = A , with
, ,
0
s j s j
b B s < , (3.42)
where
, s j
B is the number of quantization levels on subband s at level j . Following
a similar approach as in (3.19), the smallest upper bound of the MAXAD for an
intra-band wavelet codec is given by:

1
1 1 1 1
1
1
, , , ,
1 1
[( ) ( ) ( ) ( )]
S J
J j
tot tot S S J S J S s s j s j
j s
D M K b K K b
= =
s = A + A
, (3.43)
This shows that, similar to the smallest upper-bound of the MAXAD proposed in
section 3.4.2, the MAXAD upper-bound
tot
M in (3.43) is a linear combination of
subband distortion contributions, that is, of the form given by (3.33).
We notice that the particular case of
,
1 2
s j
= and
,
, , ,
s j
b b s j = corresponds to
embedded quantizers that are uniform at the finest quantization level
,
0
s j
b = , and
to the progression scheme which lead to (3.41). In this case, we can easily verify
that (3.43) is indeed equivalent to (3.41).
If one refers to MESHGRID, its embedded quantizers are SAQ, corresponding to
,
0
s j
= , for all , s j . The MAXAD distortion contribution
, s j
A at the level of each
subband is induced by the deadzone, and given by
( )
,
2
j
s j s
A = A . In this case, (3.43)
is equivalent to:

1
, ,
1
1 1 1
1
( ) 1 ( )
,0 ,0
1 1
[( ) 2 ( ) 2 ]
S J s j
S J
b b
J J j j
tot tot S S s s S
j s
D M K K K
= =
s = A + A
. (3.44)
For the particular instantiation of the wavelet transform used by MESHGRID,
expressed by (2.2) and (2.3), this becomes:

( ) ( )
( )
1
( ) ( ) ( ) ( ) ( ) ( ) ( )
1
125
64 2
125 1 5 25
64 2 8 32
J
J
LLL
tot tot
j
J
j j j j j j j
j
D M
=
A
| |
s = +
|
\ .
| | (
+ A + A + A + A + A + A + A
|
(
\ .
.
(3.45)
where the naming of the subbands is identical to that followed in (3.38).
3.5.4.2 Statistical L-infinite Distortion Estimator
The L-infinite distortion estimator presented in the previous section is data
independent, since it is computed solely based on worst-case assumptions about the
error contributions coming from the different wavelet subbands. Consequently, the
wavelet coefficients need not to be actually decoded, and no inverse wavelet
transform needs to be performed. Therefore, this approach is very fast in
computational terms. However, as shown later experimentally, this approach also
overestimates the actual L-infinite distortion, since it relies on worset-case

assumptions. An improved approach is proposed in this section, in which the L-
infinite distortion estimate is computed based on data-dependent statistical
information. It will be shown experimentally that the accuracy of this second
approach is improved substantially, while the supplementary computational
expenses are minimal.
Let us assume an arbitrary wavelet subband , 1 l l L s s which is quantized using
an embedded deadzone uniform scalar quantizer at level 0
l
b > . The bin-size at the
finest quantization level ( 0
l
b = ) is denoted by
l
A ; the deadzone size at level
l
b is
1
(2 2 )
l
b
l l
+
A and the size of the other bins is 2
l
b
l
A . Also, denote by
l
l
b
p the
probability that the wavelet coefficients in subband l fall in the deadzone when
quantized with the embedded quantizer at level , 0
l l
b b > .
Let
,
l
l
n b
e be a random variable (RV) denoting the quantization error occurring on
a single wavelet coefficient n of subband l , when the subband is quantized using
an embedded uniform quantizer operating at level
l
b . In [Alecu 2006, Alecu 2003b]
it has been shown that each subband error contribution
l
l
b
e in the total reconstruction
error
tot
e can be written as a linear combination of quantization errors
,
l
l
n b
e as:

,
l l
l l
b n n b
n
k =
e e , (3.46)
where
n
k are weighting factors that depend on the wavelet transform employed. For
the generic quantizers in (3.39), the RVs
,
l
l
n b
e are uniformly distributed either on the
interval
1 1
(2 2 ) 2, (2 2 ) 2
l l
b b
l l l l

+ +
(
A A

if the wavelet coefficient falls in
the deadzone, or on the interval 2 2, 2 2
l l
b b
l l
(
A A

in the opposite case. Hence,
we can write the means of
,
l
l
n b
e as
,
0
l
l
n b
= , and their variances as:

( ) ( )
{ }
( )
( )
( )
2 2
1
2 2
, , ,
(2 2 ) 2
1
12 12
l l
l l l l l
b b
l l l
l l l l l
n b n b n b b b
E p p
o
+
A A
= = + e .
(3.47)
Using (3.46), (3.47) and the fact that the RVs
,
l
l
i b
e and
,
l
l
j b
e ( i j = ) are
uncorrelated, the mean of
l
l
b
e is written as
{ } { } ,
0
l l l
l l l
b b n n b
n
E k E = = =
e e , while
the variance
l
l
b
o is given by:

( ) ( )
{ }
( )
( ) ( )
2
2 2 2
2 2 2
,
4 2 1 2
12
l l
l l l l
l b b l l l l
b n n b n b l b
n n
k E k p p o
A
(
= = +
(

e .
(3.48)
We notice that for 0
l
b = , equation (3.48) yields the variance
( )
2
0
l
o :

( ) ( ) ( )
( )
2
2
2
2
0 0 0
4 1 1
12
l l l l
l n
n
p p k o
A
(
= +

. (3.49)
Denote by:
64 Chapter 3

2
,
2
0 0
4 (1 2 ) (1 )
2
4 (1 ) (1 )
l
l l
l
l
b l l
b l b
b
l b
l l
l
p p
p
p p
+
+
. (3.50)
From (3.48) and (3.49), and using the definition (3.50) of
,
l
l b
p it follows that
l
l
b
o
can be statistically estimated by:

, 0
l l
l l
b l b
p o o = . (3.51)
The standard deviation
tot
o of e
tot
is the accumulated standard deviation of the
errors from all subbands, that is:

( ) ( )
2 2
2
, 0
l l
l l
tot b l b
l l
p o o o = =

. (3.52)
It has been shown in [Alecu 2006] that the spatial-domain reconstruction error
e
tot
is Gaussian distributed. Indeed, e
tot
is a linear combination of subband error
contributions e
l
l
b
. The subband error contributions e
l
l
b
are independent, as the
quantization processes operating on the wavelet subbands are independent.
Furthermore, each e
l
l
b
is uniformly distributed, according to the high-rate model of
quantization errors. In view of the Central-Limit Theorem [Papoulis 1987], the
linear combination of independent uniformly-distributed random variables is
Gaussian. Hence, the probability P that the variate e
tot
takes a value in a given
interval | | , t t can be written using the error function ( ) erf as:
( ) ( )
1
2 2
tot
tot tot
t t
P prob t t erf erf P
o o

| |
= < < = = |
|
\ .
e . (3.53)
The estimated MAXAD can then be derived from the total standard deviation
tot
o as:
( )
1
2
tot tot
M erf P o

= , (3.54)
with the estimation probability 1 P , for which ( )
1
5 2, 6 2 erf P
(
e

.
Hence, a statistical, data-dependant L-infinite estimator for an intra-band wavelet
codec can be derived from (3.52) and (3.54) as:
( ) ( )
2
2
2 1
0
2
0 0
4 (1 2 ) (1 )
2 2
4 (1 ) (1 )
l
l l
l
b l l
b l b
b l
tot
l l
l
l
p p
M erf P
p p
+
=
+
. (3.55)
We point out that in contrast to the smallest upper-bound of the MAXAD
proposed in section 3.4.2, or to the theoretical estimator (3.43), the statistical L-
infinite estimator in (3.55) is not anymore linear (i.e. of the form given by (3.33)),
but quadratic.
We notice that the statistical MAXAD estimator (3.55) is generically formulated
for an arbitrary embedded quantizer family and subband transmission order. In the
case of SAQ used by MESHGRID, 0
l
= , for which the statistical L-infinite
estimator in (3.55) becomes:

( ) ( )
2
2 1
0
0
3 1
2 2
3 1
l
l
l
b
b l
tot
l
l
p
M erf P
p
o
+
=
+
. (3.56)
3.5.5 Rate-Distortion Optimization Algorithm
Rate allocation for the proposed scalable wavelet-based mesh coding approach
requires finding the optimal truncation points for each subband, such that the overall
bit-rate is minimized subject to an upper-bound on the distortion. This constrained-
optimization problem is solved by using a Lagrangian-optimization technique,
similar to the approach used for instance in JPEG-2000 [Taubman 2002].
Specifically, for every wavelet subband , 1 l l L s s , the following distortion-rate
slopes are computed:

,
,
( 1) ( )
( )
( ) ( 1)
l
l
l b
l l l l
l l l l
l l l l l b
D
D b D b
b q q
R b R b R
A
+
= =
+ A
(3.57)
where
l
b
+
e is the quantization level in subband l , the distortions correspond to
the L-2 (MSE) or the L-infinite (MAXAD) distortion contributions in the total
distortion (see (3.33)), and
l
q are the weighting factors in (3.33), depending on the
distortion metric employed. For MESHGRID, the factors
l
q in (3.57) are revealed by
(3.38), (3.45) in the L-2 and L-infinite cases respectively. Also,
,
l
l b
R A in (3.57)
correspond to the actual increase in rate when encoding quantization level
l
b .
Hence, the distortion-rate slope
l
in (3.57) expresses the ratio between the
reduction in distortion associated to the increase of rate when an additional
quantization level
l
b is encoded. We notice that in the considered subband
transmission scheme, the higher levels ,
l
b b b > are already encoded.
The terms
,
l
l b
D A in (3.57) represent the decrease in distortion in between two
successive subband truncation points. To estimate
,
l
l b
D A , one distinguishes several
cases, as summarized next.
L-2 case. In the L-2 case, a rough estimate for
,
l
l b
D A is given by the classical
high-rate approximation
2 2
,
2 12
l
l
b
l b l
D A = A . Such an estimate is data independent,
corresponds to a classical prioritization of subband bitplanes used in wavelet coding
[Munteanu 1999a, 1999b, Said 1996a, Shapiro 1993], including MESHGRID
[Salomie 2004b], and has been used in our previous works [Cernea 2005, Cernea
2008b]. A better, data-dependent estimate for
,
l
l b
D A can be formed as follows. Let
l
b
S and
l
b
R denote the significance and refinement coding passes [Munteanu
1999a, Salomie 2004b] respectively corresponding to quantization level
l
b . Suppose
that an arbitrary coefficient is found to be significant during the significance pass
(SP)
l
b
S . Denote by the random variable x the value of the coefficient, and assume
that x is uniformly distributed in the uncertainty interval
)
1
2 , 2
l l
b b
l l
+
A A
. The
66 Chapter 3

expected square error in reconstructing the coefficient as x 0 = is:
( )
1
2
2
2 2 2 2
0
2
1 7
x x x 2
3 2
b
l
l
l
l
b
l
l
b
l
b
l
D E E x dx
+
A
A
(
( = = = = A

A
}
. (3.58)
Sending the current quantization level
l
b allows for reconstructing the value of
the coefficient in the middle of the uncertainty interval, i.e.
3
x 2
2
l
b
l
= A . This
reduces the subband distortion to:

( )
1
2
2
1
2 2
2 2
2
3
x x x 2
2
1 3 1
2 2
2 12 2
l
b
l
l
l l
l
b
l
l
b
l
b b
l l
b
l
D E E
x dx
+
A
A
(
| |
(
= = A = (
|

\ .
(

| |
= A = A
|
A \ .
}
. (3.59)
Hence, the reduction
SP
D A in the total distortion resulting from decoding a single
coefficient during the significance pass
l
b
S is given by:

2 2
0 1
27
2
12
l
b
SP l
D D D A = = A . (3.60)
The total average decrease in distortion in the significance pass is then given by:

( ) ( )
2 2
1 1
27
2
12
l
l l l l
b l l l l
SP b b SP b b l
D p p D p p
+ +
A = A = A , (3.61)
where
( )
1
l l
l l
b b
p p
+
is the probability to identify a significant coefficient during the
significance pass
l
b
S . Similar to (3.59), if one assumes that the coefficient is refined
for all the SAQ thresholds up to
1
2
l
b +
(corresponding to the previous refinement
pass
1
l
b
R
+
), the expected square error is given by:

( )
( )
2
1
2
2
1
2
2 2
2 1 1 2
1
2
3
x x x 2
2
1 3 1
2 2
2 12 2
l
b
l
l
l l
l
b
l
l
b
b b
l l
b
l
D E E
x dx
+
+
+
A
+ +
+
A
(
| |
(
= = = (
|

\ .
(

| |
= A = A
|
A \ .
}
. (3.62)
It follows that the reduction
RP
D A in the total distortion resulting from refining
the coefficient during the refinement pass (RP)
l
b
R is given by:

2 2
2 1
3
2
12
l
b
RP l
D D D A = = A . (3.63)
The total average decrease in distortion in the refinement pass is then given by:

( ) ( )
2 2
1 1
3
1 1 2
12
l
l l
b l l
RP b RP b l
D p D p
+ +
A = A = A , (3.64)
where
( )
1
1
l
l
b
p
+
is the probability to refine the coefficients in the current
refinement pass
l
b
R .
Theoretical L-infinite case. For the theoretical estimator (3.43), the decrease in
distortion is calculated using (3.42) for each additional quantization level. If SAQ is

used, the theoretical estimator is expressed by (3.44) and
,
2
l
l
b
l b l
D A = A .
Statistical L-infinite case. For the statistical L-infinite estimator (3.55), the
decrease in distortion is
2 2
, 1
( ) ( )
l l l
l l
l b b b
D o o
+
A = and can be computed based on
(3.51).
A summary of all the formulas involved for the different estimators is given in
Table 3-1.
Table 3-1: Summary of the formulas expressing the different L-2 and L-infinite
estimators.

Generic
formulation
MESHGRID ,
l
l b
D A
Data-dependent L-2
estimator
Eqn. (3.33) Eqn. (3.38)
SP: eqn. (3.61)
RP: eqn. (3.64)
Theoretical L-
infinite estimator
Eqn. (3.43) Eqn. (3.45)
,
2
l
l
b
l b l
D A = A
Statistical L-infinite
estimator
Eqn. (3.55) Eqn. (3.56)
2 2
, 1
( ) ( )
l l l
l l
l b b b
D o o
+
A =
and use eqn. (3.51)
For each wavelet subband, the slopes ( )
l l
b in (3.57) are assumed to decrease
monotonically when increasing the rate [Taubman 2002]. If some of the truncation
points do not follow this constraint, they do not lie on the convex-hull defined by the
discrete set of distortion-rate points; hence, they will not be considered as feasible
truncation points. To find the order in which the quantization levels corresponding
to the subbands from all decomposition levels should be optimally selected, the
( )
l l
b from all subbands l are merged and sorted in a monotonically decreasing
order. The order in which the ( )
l l
b are sorted indicates the order in which the
subbands are encoded. This corresponds to a global distortion-rate curve for the
entire mesh, for which the slopes are monotonically decreasing.
3.6 RELATION BETWEEN MAXAD AND THE
HAUSDORFF DISTANCE
The Hausdorff distance gives an estimation of how similar two meshes are
compared to each other. Denoting the two meshes by X and Y

respectively, and
representing the vertices of each mesh by x X e and y Y e respectively, one can
measure the Hausdorff distance between X and Y

by computing the longest
distance one is forced to travel from all the vertices in mesh X to any vertex in
mesh Y (see Figure 3-13 (a)). Mathematically, the Hausdorff distance can be
formulated as:
68 Chapter 3

( ) ( ) ( , ) max , , , sup inf sup inf
x X y Y y Y x X
h X Y d x y d y x
e e e e

=
`
)
(3.65)
where sup represents the supremum and inf the infimum.
An important factor in obtaining a correct measurement when comparing two
meshes is to align them in order to ensure the same orientation, position and scale.
Figure 3-13 (a) shows the case when the two meshes X and Y

have similar shapes,
but different orientations, which results in an incorrectly large estimation of the
Hausdorff distance as compared to the aligned case in Figure 3-13 (b). In general,
when comparing any two meshes, this is a complex and time-consuming problem,
but an essential step for an accurate distortion measurement. However, in the case of
lossy compression of meshes, this step can be omitted since the compression process
is guaranteed to generate a mesh with the exact same alignment as the original mesh.

Figure 3-13: Two comparison cases of the same polygons X and Y: (a) unaligned;
(b) aligned. The Hausdorff distance calculation is shown for the first case.
Additionally, if between the set of the original vertices and the lossy coded
vertices exists a bijective correspondence, like in the MESHGRID codec, the
complexity of estimating the distortion is furthermore reduced. This being the case,
an important advantage of performing scalable L-infinite-constrained compression
of meshes stems from the fact that the Hausdorff distance between the original and
reconstructed meshes at a given resolution is upper bounded by the MAXAD, as
explained next.
Indeed, let A be the set of losslessly decoded vertices at a given resolution and
A be the set of vertices decoded at a given rate at the same resolution. A
(a)
X
Y
(b)
( ) , sup inf
x X y Y
d x y
e e

( ) , sup inf
y Y x X
d y x
e e


progressive refinement of the vertex positions at the considered resolution is
equivalent to refining the positions a A e until eventually A A = , corresponding to
the lossless reconstruction of the mesh. We note that { } { } card A card A = , where
{} card defines the cardinality of a set. Hence, for any a A e there exists a unique
corresponding vertex ( ) C a A e , such that for the lossless reconstruction of that
resolution level ( ) a C a = .
The Hausdorff distance between A and A is defined as
( ) ( ) { }
{ }
, max min ,
a A a A
h A A d a a
e e
= , where a and a are points of sets A and A
respectively, and ( ) , d a a is any metric between these points. If we take ( ) , d a a as
the Euclidean distance between a and a , then:

( ) ( ) { }
{ }
( ) { }
, max min , max , ( )
a A a A a A
h A A d a a d a C a MAXAD
e e e
= s = . (3.66)
This shows that L-infinite-constrained coding actually implies Hausdorff-
distance-constrained coding of meshes. Practically, one has the possibility to set a
specific target bound on the L-infinite distortion, and due to (3.66), the Hausdorff
distance will be guaranteed to be below that target bound. Furthermore, the
proposed approach achieves scalability in L-infinite sense, corresponding to a
perfectly predictable L-infinite / Hausdorff-distance upper-bound for every
allowable truncation point. These features render the proposed L-infinite-
constrained coding approach a unique and interesting alternative to all mesh coding
techniques proposed so far in the literature.
L-infinite coding is also important if we address dynamic meshes or scenes
including dynamic objects. In such scenarios, it is clear that computing the
Hausdorff distance for all possible decodings of the inputs objects at each time
instance is a cumbersome task, even when employing fast tools to estimate the
distortion, such as the Metro tool, used for static scenes in [Tian 2007a].
Finally, another major benefit of the proposed L-infinite mesh coding approach is
that closed-form estimates of the L-infinite distortion are readily available, as
already shown in sections 3.5.4.1 and 3.5.4.2. Based on such closed-form estimates,
real-time algorithms solving the R-D optimization problem can be designed, as
shown in section 3.5.5.
3.7 MESHGRID INSTANTIATION
For the particular case of MESHGRID, it is important to observe that the
coordinates of the vertices do not need to be encoded explicitly, since their values
are derived from the coordinates of the reference-grid points (see section 2.2.1).
However, the errors generated by the lossy coding process of the reference grid are
70 Chapter 3

affecting directly the vertex coordinates in the reconstructed mesh. In that sense, it is
shown next that the difference in vertex positions between the original mesh M and
the reconstructed mesh M is upper-bounded by the MAXAD, that is:
( ) ( ) , M s e v v v M M M. (3.67)
Indeed, consider an arbitrary vertex position e v M lying on a reference-grid line
defined by two reference-grid points denoted by
,1 RG
v and
,2 RG
v . We assume that
v is attached to the reference-grid point
,1 RG
v , and let o be the offset establishing
the relative position of v with respect to
,1 RG
v . Similar to (2.1), one can write:

( )
,1 ,2 ,1 RG RG RG
o = + v v v v . (3.68)
Suppose that L-infinite coding is applied to the reference-grid, and that after
decoding, the new reference-grid coordinates corresponding to
,1 RG
v and
,2 RG
v are
,1 RG
' v and
,2 RG
' v respectively. Because of L-infinite coding, one has:

,1 ,1 ,1 ,2 ,2 ,2
,
RG RG RG RG RG RG
M M ' ' = s = s v v r v v r . (3.69)
Similar to (3.68), the new vertex position ' v after L-infinite decoding will be
given by:

( )
,1 ,2 ,1 RG RG RG
o ' ' ' ' = + v v v v . (3.70)
From (3.69) it follows that:

( ) ( ) ( )
( ) ( )
,1 ,1 ,2 ,2 ,1 ,1
,1 ,2 ,1 ,1 ,2 ,1
RG RG RG RG RG RG
RG RG RG RG RG RG
o
o o
( ' = + + + + =

( ( = + + +

v v r v r v r
v v v r r r
. (3.71)
Replacing (3.68) in (3.71) yields:
( )
,1 ,2
1
RG RG
o o ' = + v v r r . (3.72)
Using the simple property a b a b + s + and (3.69) leads to (3.67), which ends
the proof. One concludes that an L-infinite encoding of the reference-grid with a
certain bound M corresponds to the L-infinite encoding of the mesh with the same
bound M .
3.8 EXPERIMENTAL RESULTS
In this section, an instantiation of the proposed scalable L-infinite mesh coding
approach is experimentally demonstrated by using MESHGRID [Salomie 2004b].
3.8.1 Error Distribution
The first set of experiments empirically verifies that the spatial-domain
reconstruction errors are Gaussian distributed for meshes coded lossy with
MESHGRID.
Figure 3-14 and Figure 3-15 illustrate the cumulated vertex error distribution for

the Heart and Humanoid models coded at the target MAXAD bound of 1% and 5%
respectively, using the theoretical and statistical L-infinite distortion estimators. The
fitted normal distribution is represented by the red line.

Figure 3-14: Cumulated vertex error distribution in x, y, z directions for the Heart
model coded at the target MAXAD bound of 1%, using the theoretical (left) and
statistical (right) L-infinite distortion estimators. The red line represents the fitted
normal distribution.

Figure 3-15: Cumulated vertex error distribution in x, y, z directions for the
Humanoid model coded at the target MAXAD bound of 5%, using the theoretical
(left) and statistical (right) L-infinite distortion estimators. The red line represents
the fitted normal distribution.
In addition, Table 3-2 summarizes the error distribution statistics for more models
coded at various bit-rates.
The results indicate that the spatial errors distribution can be modeled using a
normal distribution. This can be also confirmed using metrics such as the Kullback-
Leibler distance between the actual and modeled distributions. One concludes that
the assumptions made in sections 3.3 and 3.5.3 are valid.

72 Chapter 3

Table 3-2: Error statistics for several lossy coded MESHGRID models obtained when
allocating rate using the theoretical and statistical L-infinite distortions, for the
same RG bit-rates.
Model bpv
L-inf Theoretical L-inf Statistical
M M
Heart
2.21 0.165 0.069 0.517 0.170 0.068 0.522
0.76 0.389 0.182 1.149 0.385 0.185 1.017
0.22 0.999 0.376 2.550 0.952 0.363 2.492
Humanoid
16.62 0.002 0.001 0.007 0.002 0.001 0.007
11.76 0.010 0.005 0.035 0.009 0.006 0.037
1.68 0.238 0.132 0.918 0.267 0.123 0.924
Swiss
Landscape
1.62 0.123 0.157 1.474 0.123 0.157 1.481
1.03 0.388 0.314 2.468 0.391 0.322 2.540
0.49 0.742 0.646 4.892 0.742 0.646 4.892

Lossless (38.68) 0.5% (2.61) 1.5% (0.69)

5% (0.08) 20% (0.008) 100% (
5
4 10
< )
Figure 3-16: L-infinite scalability provided by the proposed approach: Heart model
decoded at various target MAXAD bounds. The target MAXAD values (%) and the
resulting bit-rates (bpv) needed to encode the geometry are indicated under each model.

3.8.2 L-infinite Scalability
The second set of experiments illustrates the scalability in L-infinite sense
provided by the proposed system. Figure 3-16 and Figure 3-17 depict the visual
results obtained when decoding two MESHGRID models for a broad range of target
MAXAD bounds, when using the theoretical MAXAD estimator (3.45). The target
MAXAD is expressed in percentages relative to the size of the bounding box
containing the model. The first pictures in Figure 3-16 and Figure 3-17 are the
lossless encoded versions of the Heart and Melted Tangle Cube models respectively.
The bit-rate (BR) in bits per vertex needed to encode the geometry at each target
MAXAD is also determined and indicated in parentheses under each model. Notice
that for the last picture, all the bit-planes are truncated, that is, practically no rate is
spent to encode the RG (apart of encoding the corners of the bounding box
containing the object); in this case, the reconstructed RG is uniformly distributed,
leading to the significantly altered shape of the models.

Lossless (64.65) 0.5% (4.90) 1.5% (2.61)

5% (0.38) 20% (0.010) 100% (
5
6 10
< )
Figure 3-17: L-infinite scalability provided by the proposed approach: Melted
Tangle Cube model decoded at various target MAXAD bounds. The target MAXAD
values (%) and the resulting bit-rates (bpv) needed to encode the geometry are
indicated under each model.
In the following set of experiments, the Rabbit (see Figure 3-18) and Feline (see
Figure 3-19) models have been compressed at target MAXAD values that keep the
reconstructed objects in the visually (near) lossless range.
74 Chapter 3

Lossless (59.93) 0.1% (34.09) 0.3% (25.95) 0.5% (21.65)
Figure 3-18: L-infinite scalability provided by the proposed approach: Rabbit model
resulting bit-rates (bpv) needed to encode the geometry are indicated under each
model.
These results indicate that MAXAD bounds up to maximum 5% (depending on
the complexity of the model) need to be targeted, to avoid potentially significant
geometric distortions in the decoded objects. For models employing a very fine non-
uniform reference-grid, i.e. Rabbit and Feline, the maximum value for MAXAD is
reached around the bound of 1%, where all the bitplanes are cut (see Figure 3-20). It
is important to observe also the smooth visual improvement generated when
decreasing the target MAXAD. We note also that for small target MAXAD values,
there are no visual differences between the original and the lossy-encoded versions
of the mesh, while the achieved compression ratios are high (the geometry requires
about 1% of the lossless rate). These results highlighting the efficiency of the L-
infinite codec show that the L-infinite metric is indeed a suitable distortion metric in
scalable coding of 3D models, and demonstrate the scalability in L-infinite sense of
the proposed mesh coding approach.

Lossless (63.80) 0.1% (33.75)

0.3% (24.97) 0.5% (20.69)
Figure 3-19: L-infinite scalability provided by the proposed approach: Feline model
resulting bit-rates (bpv) needed to encode the geometry are indicated under each
model.
76 Chapter 3

Figure 3-20: The Rabbit model decoded at MAXAD bound of 1%.
3.8.3 Distortion Metrics Comparison: L-2 vs. L-infinite
Additional experiments use both the L-2 and L-infinite distortion metrics and
compare the results numerically and visually. For the L-2 distortion metric, we
implemented both data-independent and data-dependent estimators. We note that in
our instantiation, the data-independent L-2-driven codec actually corresponds to the
standard MPEG-4 AFX MESHGRID coding system.
The first experiment is intended to compare the data-independent and data-
dependent L-2 distortion estimators, which are judged against the L-infinite
distortion estimator. The Humanoid model is compressed at a user-specified target
MAXAD bound (1.5%) using the L-infinite codec, employing the statistical
estimator (3.56). Subsequently, the L-2-driven codecs compress the geometry of the
model at the same rate as the L-infinite version, but minimize the L-2 distortion
instead. The results are compared visually in Figure 3-21. We point out that all
systems use the same entropy coding engine (i.e. MESHGRID), so the difference

between them comes only from the different distortion metrics employed.

Data-independent L-2 codec Data-dependent L-2 codec

Original L-infinite codec
Figure 3-21: Zoom on the shoulder area of the Humanoid model compressed with
the L-2 and L-infinite codecs at a target MAXAD of 1.5% (1.68 bpv). The data-
independent L-2-driven codec corresponds to the standard MPEG-4 AFX
MESHGRID system.
The results presented in Figure 3-21 show that the standard MPEG-4 AFX
MESHGRID system fails to provide acceptable results at low rates. However,
employing the proposed data-dependent L-2 estimator significantly improves the L-
2 coding performance, bringing it visually close to the L-infinite system. Given its
much better performance at low rates, in the subsequent experiments, we use the
data-dependent L-2 estimator instead of the original data-independent L-2 estimator
integrated in the standard MPEG-4 AFX MESHGRID system and used to produce our
results in [Cernea 2005, Cernea 2008b]. Additional experiments presented in Figure
3-22 complements the visual evaluation of the data-independent L-2 codec
performance. On the top row, the pictures depict four decodings of the Humanoid
model at several bitrates employing the data-independent L-2 codec. Note that the
color shades indicate the areas where the vertex errors exceed the requested target
MAXAD bound. The pictures on the bottom row show the decodings for the same
rates using the L-infinite codec. As expected, all models are entirely green since the
rate-allocation was optimized in L-infinite sense; nevertheless, they are included as
reference for a visual comparison of the geometry.
78 Chapter 3

MAXAD = 1.5%,
1.68 bpv
MAXAD = 1.0%,
2.69 bpv
MAXAD = 0.5%,
5.46 bpv
MAXAD = 0.1%,
9.92 bpv

Figure 3-22: L-2 (top) versus L-infinite (bottom) coding of the Humanoid model.
The L-2-driven codec corresponds to the standard MPEG-4 AFX MESHGRID
system. The color shades indicate the areas where the vertex errors exceed the
requested target MAXAD bound. The target MAXAD values (%) and the resulting
reference-grid bit-rates (bpv) needed to encode the geometry are indicated for each
pair of models.
In the next experiments, we plot the actual MAXAD versus rate for the L-2 and
L-infinite codecs for two models and for a broad range of rates (Figure 3-23). We
notice that all the dots on the graphs in Figure 3-23 are decodable points, where the
local error is clearly upper-bounded and guaranteed. The sufficient density of points
shown in Figure 3-23 clearly indicates the fine-granularity in L-infinite sense

provided by the proposed approach. These results indicate also that very large gaps
in terms of MAXAD can occur for the L-2 codec, this phenomenon being
completely un-controllable for this system. This shows that an optimization with
respect to the L-2 distortion lays no claim on minimizing the local error, in this
sense having the potential of introducing large local error-spikes (i.e., large vertex-
position errors) that otherwise are not present in an L-infinite-coding framework.
0,0%
1,0%
2,0%
3,0%
4,0%
5,0%
6,0%
7,0%
8,0%
9,0%
10,0%
0,0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0
bpv
M
A
X
A
D
L-2
Theoretical L-infinite

0,0%
0,5%
1,0%
1,5%
2,0%
2,5%
3,0%
3,5%
4,0%
4,5%
5,0%
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
bpv
M
A
X
A
D
L-2

Figure 3-23 (part 1 of 2): MAXAD versus rate for the Mars (top) and Heart
(bottom) models for the L-2 (data-dependent) and L-infinite coding systems.
80 Chapter 3

0%
1%
2%
3%
4%
5%
6%
7%
8%
9%
10%
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
bpv
M
A
X
A
D
L-2

Figure 3-22 (part 2 of 2): MAXAD versus rate for the Melted Tagle Cube model for
the L-2 (data-dependent) and L-infinite coding systems.
This phenomenon is illustrated visually in Figure 3-24, Figure 3-25, Figure 3-26
and Figure 3-27. The experiments are performed on Humanoid at various MAXAD
targets (Figure 3-24), and on three other models in Figure 3-25, Figure 3-26 and
Figure 3-27, including Melted Tangle Cube, Heart, Swiss Landscape and a smooth
surface with sharp local features (Mars). Similar to above, the L-infinite codec
decodes the object at specific MAXAD bounds, while the L-2 codec operates at the
same rates as the L-infinite codec, minimizing the L-2 distortion for each rate. The
target MAXAD values and the resulting bit-rates are given in the figures. Also, the
RMSE (root mean-square error) values at these bit-rates are given for each codec.
The local errors are indicated using colours, the assigned colour being proportional
to the magnitude of the local error. In the considered colour maps, exceeding the
MAXAD bound is indicated in red. It is important to point out that the colour range
changes with the target MAXAD, so the colour maps schemes are different for
every rate. Therefore, the visual comparison should be performed only among the
different estimators and not among rates.

RMSE = 0.112% RMSE = 0.079% RMSE = 0.065%
MAXAD = 3.0%, 0.68bpv MAXAD = 1.6%, 1.53bpv MAXAD = 1.2%, 2.13bpv
RMSE = 0.114% RMSE = 0.088% RMSE = 0.080%

Figure 3-24: L-2 (top) versus statistical L-infinite (bottom) coding of the Humanoid
model. The color shades are proportional to the local error and exceeding the
MAXAD bound is indicated in red.
82 Chapter 3

RMSE = 0.099% RMSE = 0.113%
MAXAD = 2.0%, 1.11bpv

RMSE = 0.038% RMSE = 0.045%
MAXAD = 0.8%, 1.10bpv
Figure 3-25: L-2 (left) versus L-infinite (right) coding of the Melted Tangle Cube and
Heart models. The ellipses highlight the areas where the vertex errors exceed the
requested MAXAD bound.

RMSE = 0.061%
MAXAD = 1.6%, 0.05bpv
RMSE = 0.074%

Figure 3-26: L-2 (top) versus L-infinite (bottom) coding of the Swiss model. The
ellipse highlight the area where the vertex errors exceed the requested MAXAD
bound.
84 Chapter 3

RMSE = 0.061%
MAXAD = 1.6%, 0.05bpv
RMSE = 0.074%

Figure 3-27: L-2 (top) versus L-infinite (bottom) coding of the Mars model. The
ellipses highlight the areas where the vertex errors exceed the requested MAXAD
bound.
These results demonstrate the local-error control performed by the proposed L-
infinite coding approach, which, as expected, never exceeds the imposed MAXAD
bound. Visually though, the results are similar for the L-infinite and data-dependent
L-2 estimators. This is due to the particular nature of MESHGRID; indeed, for Heart
for instance, for a MAXAD bound of 0.8% (Figure 3-25 bottom), the local error
exceeds the MAXAD on 1121 reference-grid points (2.34%), going up to 4155
points (8.68%) for a MAXAD bound of 1%. Not all these errors are visible on the
mesh, and this is due to the very specific nature of the MESHGRID system, for which
the mesh vertices are connected only to a part of the reference-grid points [Salomie
2004b]. Hence, errors on the reference-grid are translated to the mesh only for those

RG points that are directly linked to mesh vertices. Nonetheless, it is important to
remark that, although yielding very similar visual results to the L-infinite codec, the
L-2 codec cannot claim any kind of local error control, at any rate and irrespective
of the mesh type (smooth or sharp).
We notice also that in L-2 sense (i.e. in terms of MSE), the L-2 codec version is
systematically better than the L-infinite version. This must be the case, and it comes
with no surprise. For any rate, the L-2 codec reaches the minimum L-2 distortion,
because the rate allocation is optimized in L-2 sense. However, we would like to
highlight here the fact that the MSE differences between the L-2 and L-infinite
codec versions are small. Furthermore, despite of providing a smaller MSE, the L-2
codec may be affected by large local errors, in particular at low rates.
Overall, these experimental results show that the proposed L-infinite coding
approach (i) performs local-error control, in contrast to the global-error control in
the case of the L-2 version, (ii) provides L-infinite scalability, and (iii) outperforms
the standard MPEG-4 AFX MESHGRID system in L-infinite sense.
3.8.4 Distortion Metrics Comparison: Theoretical vs. Statistical
L-infinite
The fourth set of experiments compares the two proposed L-infinite estimators.
The accuracy in estimating the real MAXAD is presented graphically in Figure
3-28. As shown in Figure 3-28, while the theoretical approach quickly diverts from
the real MAXAD values, the statistical method is much more accurate and is
following closer the real MAXAD curve. We notice also that all the dots on the
graphs in Figure 3-28 are decodable points, where the local error is clearly upper-
bounded and guaranteed. The sufficient density of points shown in Figure 3-28
clearly indicates the fine-granularity in L-infinite sense provided by the proposed
approach.
86 Chapter 3

0%
10%
20%
30%
40%
50%
60%
0 1 2 3 4 5 6 7 8 9 10
bpv
M
A
X
A
D
Actual MAXAD
Estimated MAXAD Theoretical
Estimated MAXAD Statistical

0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
0 1 2 3 4 5 6 7 8 9 10
bpv
M
A
X
A
D
Actual MAXAD
Estimated MAXAD Theoretical
Estimated MAXAD Statistical

Figure 3-28: Performance evaluation of the theoretical (orange line) and statistical
(green line) L-infinite distortion estimators versus the actual L-infinite distortion
(blue line) obtained on the Humanoid and Heart models. The X and Y axis depict
the size in bits per vertex (bpv) and the distortion as MAXAD respectively.


Table 3-3: Hausdorff distance (%) versus the real MAXAD values (%) for the
Heart, Humanoid, Swiss Landscape, Feline and Venus models obtained at various
RG bit-rates.
Model Grid BR (bpv) Hausdorff % MAXAD %
Heart
2.21 0.50 0.52
0.96 1.00 1.02
0.22 2.49 2.49
Humanoid
16.62 0.01 0.01
13.06 0.02 0.02
11.76 0.04 0.04
1.68 0.89 0.92
Swiss
Landscape
1.62 1.48 1.48
1.03 2.47 2.54
0.49 4.46 4.89
Feline
24.74 0.06 0.06
15.96 0.17 0.17
11.68 0.30 0.30
Venus
26.27 0.05 0.05
15.15 0.17 0.17
11.96 0.26 0.29
Table 3-4: Execution time (seconds) required to estimate distortion for the L-2,
data-independent (L-infinite theoretical) and statistical L-infinite estimators for
Heart (42312 triangles), Humanoid (27020 triangles) and Swiss Landscape (668024
triangles) models.
Model
L-infinite
theoretical
L-infinite
statistical
L-2
Heart 0.115 0.265 0.115
Humanoid 0.107 0.208 0.105
Swiss
Landscape
0.266 0.599 0.265
Table 3-3 reports the Hausdorff distance values and the real MAXAD values
obtained on three different models at various rates. One notes that the Hausdorff
distance follows closely the real MAXAD, and is indeed upper bounded by the
MAXAD, as stated in [Cernea 2008a] and by equation (3.66).
It is important to notice that both L-infinite methods estimate the real MAXAD
introduced by cutting a certain bitplane, without actually decoding the models. This
makes them very fast in assessing the actual distortion, being suitable for real-time
applications. While the theoretical approach (3.45) is purely data-independent and
thus having practically negligible computational costs, the second one (3.56) uses
statistical estimates derived from the data being coded, which increases the
computational cost, but only slightly. This is shown in Table 3-4, reporting the
88 Chapter 3

execution times for the L-2, theoretical and statistical L-infinite estimators operating
at maximum rate (i.e. for all bit-planes). Overall, the low execution times for the L-
2, theoretical and statistical L-infinite estimators show that, even without speed
optimizations, the distortion estimation approaches allow for real-time
implementations indeed.
The results in Table 3-4 show that the most complex estimator is the statistical L-
infinite estimator, given by (3.56). Its total complexity is proportional to the
number of truncation points. The number of truncation points depends on the
number
l
B of subband bit-planes, and is given by
1
L
l
l
B
=
. Thus
0
1
( )
L
l
l
B
=
=
,
where
0
is the complexity required to estimate the MAXAD for an arbitrary
truncation point, corresponding to a certain set of subband quantization levels
l
b
We notice that the total complexity does not directly depend on the number of
vertices. However, is influenced by the number of vertices: objects that are more
complex will have more subbands and bit-planes.
Based on these results, we can conclude that a data-dependent L-2 estimator
proves to be sufficient for applications for which geometry accuracy is not critical.
However, L-infinite coding is the only available option for applications for which
preserving geometry accuracy is compulsory. Examples of such applications include
coding of topographic landscapes, where each vertex location is associated to a
specific measurement, industrial applications (3D CAD, architectural design, 3D
representation and coding of industrial devices, assemblies and installations), mesh
geometry watermarking etc. In such applications, vertex positions correspond to
specific measurements (e.g. heights) or are used to derive specific measurements;
hence, in order to ensure a controllable tolerance on the measurement error,
bounding the local error when compressing the models geometry is of critical
importance.
As a final remark, the L-infinite coding approaches proposed in this chapter are
not solely limited to static models, but they can be extended to dynamic models as
well. Chapter 5 shows that the proposed L-infinite coding approach can be
successfully employed also in scalable compression of dynamic models.
3.9 CONCLUSIONS
This chapter introduces the novel concept of scalable L-infinite-oriented coding
of meshes. A thorough analysis of several design options reveals that an intra-band
wavelet-based coding approach should be followed in order to provide fine-granular
scalability in L-infinite sense. In this context, a novel approach for scalable wavelet-
based coding of meshes is proposed, which allows for minimizing the rate subject to

an L-infinite distortion constraint. Two L-infinite distortion estimators are presented,
expressing the L-infinite distortion in the spatial domain as a function of
quantization errors produced in the wavelet domain. Based on these, the proposed L-
infinite codec optimizes the rate allocation for which the L-infinite distortion (and
consequently the Hausdorff distance) is upper-bounded by a user-defined bound,
and guaranteed to be below that bound. This is an interesting and unique feature in
the context of 3D object coding.
The proposed approach provides scalability in L-infinite sense, that is, any
decoding of the input stream will correspond to a perfectly predictable upper-bound
on the L-infinite distortion and Hausdorff distance. In other words, solving an L-
infinite-constrained optimization problem is equivalent to finding a rate allocation
such that the Hausdorff distance at the decoded resolution is upper-bounded. This
represents a unique and interesting alternative to all mesh coding techniques
proposed so far in the literature.
The experimental results demonstrate that the proposed approach outperforms the
standard MPEG-4 AFX MESHGRID coding system in L-infinite sense. Furthermore,
a data-dependent L-2 estimator is also proposed, significantly improving the coding
performance at low rates of the original MPEG-4 AFX MESHGRID coding system.
Based on the experimental results, we conclude that a data-dependent L-2 estimator
is sufficient for applications for which geometry accuracy in not critical. However,
L-infinite coding is the only available option for applications for which preserving
geometry accuracy is compulsory.
Finally, the proposed approach preserves all the scalability features and animation
capabilities of the employed scalable mesh codec and allows for fast, real-time
implementations of the rate-allocation. These are particularly important in real-time
applications and in the context of MPEG-4 AFX. With respect to the latter, the
proposed system allows for developing a scalable L-infinite coding extension of the
MESHGRID system, without changing the characteristics and/or the existing syntax
of this MPEG-4 standard.

Chapter 4
SCALABLE ERROR-RESILIENT
CODING OF MESHES
4.1 INTRODUCTION
The scalable mesh coding techniques such as 3D Mesh Coding [Taubin 1998b],
Wavelet Subdivision Surfaces [Lounsbery 1997], or MESHGRID [Salomie 2004a]
provide bandwidth adaptation and offer a broad range of functionalities, including
quality and resolution scalability and view-dependent decoding. However, in the
context of network transmissions, they do not address major network considerations
such as packet losses. Because of the sensitivity and interdependence of the encoded
bitstream layers, when a packet is lost due to transmission errors, all the following
packets will have to be discarded. In general, without appropriate measures, scalable
mesh coding techniques produce bitstreams that are very sensitive to transmission
errors, i.e. even a single bit-error may propagate and cause the decoder to lose
synchronization and eventually collapse. This results into a catastrophic distortion in
the decoded 3D model. Appropriate error protection mechanisms are therefore of
vital importance in transmission over error-prone channels, in order to protect the
bitstream against severe degradations caused by network losses and to reduce the
end-to-end delay.
This problem is addressed in this chapter, which proposes a novel joint source and
channel coding (JSCC) approach for meshes providing optimized resilience against
transmission losses and maintaining the scalability features of the employed scalable
source coder.
The chapter is structured as follows. We begin by making a survey of the state-of-
the-art error-resilient coding techniques, in section 4.2. Next, section 4.3 formulates
the JSCC problem and presents its complete derivation. Section 4.4 reports the
experimental results obtained with the MESHGRID instantiation of the proposed
JSCC approach. Finally, section 4.5 draws the conclusions of this work.
92 Chapter 4

4.2 ERROR-RESILIENT MESH CODING
TECHNIQUES
In the literature, there is little work addressing error-resilient coding of meshes.
The available techniques can be divided in two main categories: (i) mesh
partitioning schemes, which segment the input mesh into several sub-meshes (or
regions) which are encoded and error-protected individually, and (ii) progressive
mesh coding schemes, which adopt a scalable mesh coding approach such that the
mesh is split into more resolution levels that are error-protected individually.
4.2.1 Mesh Partitioning Techniques
In the first category, a solution proposed by Yan et al. in [Yan 2001] is to
partition the 3D mesh to be transmitted into small segments with joint boundaries
and of uniform size, which are coded and protected individually. The approach
extends the error-free constructive traversal compression scheme proposed by Li
and Kuo [Li 1998b]. The size of a segment is determined adaptively based on the
channel error rate. The topology and geometry information of each segment and
each joint boundary is coded independently. The coded topology and first several
important bit-planes of the joint-boundary data are protected against channel errors
by using the BoseChaudhuriHocquenghem (BCH) error-correcting codes. At the
decoder, each segment is decoded and checked for channel errors. The decoded
joint-boundary information is used to perform data recovery and error concealment
on the corrupted segment data. All decoded segments are combined together
according to their configuration to reconstruct all connected components of the
complete 3D model. In [Yan 2005], four mesh segmentation schemes are examined,
i.e. multiseed traversal, threshold traversal, morphing-based volume splitting, and
content-based segmentation. Although the results are interesting, a significant
disadvantage of this approach is that processing is performed at a single resolution,
not allowing for a scalable bitstream transmission and reconstruction of the input
mesh.
Recently, Park et al. [Park 2006, 2003] addressed this issue and proposed a
similar method enhanced with a shape-adaptive partitioning scheme, wherein each
partition is progressively compressed. The employed mesh segmentation algorithm
is based on a generalized Lloyd algorithm (GLA) [Linde 1980] for 3D meshes. The
input mesh surface is coarsely divided into smooth and detailed regions, and each
region is further divided into partitions of similar sizes. A progressive encoder then
independently encodes each segment. The encoder [Park 2002] is an improved
version of Pajarola and Rossignacs algorithm [Pajarola 2000], that uses cosine
Scalable Error-Resilient Coding of Meshes 93

index prediction and a two-stage prediction for connectivity and geometry data
respectively. Additionally, the proposed algorithm employs a boundary collapse
rule, so that the decoder can seamlessly zip the boundaries between segments at
different levels of details (LOD). Anchor vertices, which are vertices that are
connected to more than three segments, are used by the decoder in order to zip the
boundaries of different segments. The corrupted segments are recovered by using an
error concealment scheme, which exploits the surface and boundary information of
adjacent segments. Despite of these measures, in adverse channel conditions, the
recovered mesh at the decoder side has potentially large local errors, i.e. spikes, or
even missing pieces. Moreover, only the encoded bit stream is transmitted through
an error-prone channel, while the anchor vertex information is assumed to be sent
through an error-free channel. Hence, this error-resilient coding approach is based
only on mesh partitioning and error concealment, without employing any kind of
forward error correction techniques, and therefore limited only to their specific 3D
mesh compression scheme [Park 2002].
4.2.2 Progressive Mesh Coding Techniques
The second category of error protection algorithms for meshes abandons the mesh
partitioning approach, addressing progressive mesh coding schemes directly
[Pajarola 2000], such as in [Al-Regib 2002, Al-Regib 2005a, Al-Regib 2005b,
2005c, Chen 2005, Li 2006, Tian 2007a].
Among them, Al-Regib et al. [Al-Regib 2002] proposed an algorithm that
allocates the code rates for forward error correction employing modeled rate-
distortion curves. The Compressed Progressive Mesh (CPM) [Pajarola 2000]
algorithm is used to generate a hierarchical bit-stream representing different levels
of details. CPM is based on two operations: edge-collapse and vertex-split. These
two operations are illustrated in Figure 4-1. The edge-collapse and vertex-split
operations are applied at the encoder and decoder, respectively. Each edge-collapse
operation is represented by two classes of information, namely connectivity and
geometry. The connectivity information specifies whether a vertex is to be split or
not as well as the corresponding edges to be split, while the geometry information
specifies the coordinates of the new added vertices.
The encoding process is iterative. At the beginning of each iteration, a subset of
edges is chosen to be collapsed. These edges have to satisfy certain restrictions so
that they can be collapsed within the current LOD [Pajarola 2000]. These
restrictions make the edges being collapsed independent of each other, and hence,
the decoding process (vertex-split operation) for a given vertex is independent from
the others. However, these restrictions limit also the compression algorithm in
94 Chapter 4

optimizing the rate based only on the current LOD, disregarding the redundant
information from the previous LODs.
Additionally we note that before generating the bit-stream for the collapse
operations of a certain LOD, the vertices are first sorted. In order to stay
synchronized, both the encoder and the decoder should have the same ordering of
vertices at the beginning of each iteration.
In order to provide resilience against transmission errors, forward error correction
(FEC) is applied. The FEC codes used in this paper are the Reed-Solomon (RS)
codes [Rizzo 1997]. Subsequently, the block of packets (BOP) [Horn 1999]
technique is adapted and used as packetizing method. In this method, the data is
placed in horizontal packets and then FEC is applied across the BOPs, vertically.
Such a method is most appropriate for packet networks where burst errors are
common [Horn 1999]. Each packet is protected with a FEC code determined via a
distortion function that accounts, independently, for the channel packet loss rate, the
nature of the encoded 3D mesh and the error protection bit-budget.
The decoder combines all correctly received packets of a certain BOP and counts
the number of lost packets. Let ( ) ,
l
n k be the RS code applied to a given BOP. If the
number of lost packets is not more than ( )
l
n k , then the decoder will be able to
recover all lost packets in this BOP. Otherwise, the decoder considers these packets
as lost and irrecoverable. If a certain part of the bit-stream is not decoded, then this
part and all parts received afterwards are considered to be lost.

Figure 4-1: The edge-collapse and vertex-split operations of the Compressed
Progressive Mesh algorithm.
The base-mesh and every level of detail bit-streams are each packetized into one
BOP. Hence, there are as many packets as mesh resolution levels. However, since
the bit-stream is packetized into a relatively small number of packets, the solution
proposed by [Al-Regib 2002] remains sensitive to packet losses. Additionally, the
algorithm pre-defines a certain amount of bits that can be used for error protection
edge-collapse
vertex-split
V1
V2
V4
V3
V4
V2
V1

and is unable to determine the optimal source and channel rates for a given bit
budget.
Most techniques, such as [Al-Regib 2005b, 2005c], employ Unequal Error
Protection (UEP) [Albanese 1996] approaches to protect each layer in the scalable
representation of the mesh. Recently, Al-Regib et al. improved their approach of
[Al-Regib 2002] and proposed a joint source and channel coding of the mesh [Al-
Regib 2005a]. The rate is allocated following a two-step optimization approach.
That is, given a total bit budget B and a channel packet-loss rate
LR
P , (a) an
optimized total channel rate C is determined using an exhaustive-search technique,
and (b) optimized protection levels for each layer are then derived for each possible
total channel rate, using again an exhaustive-search method. At the first step, the
search is started assuming no error protection, namely the total channel rate 0 C = .
At each iteration, the total channel rate C is increased by a predetermined
increment Q . Having the total bit budget B and the total channel rate C set, an
exhaustive search is performed to find the best compromise between the geometry
coordinates quantizer l and the number of transmitted levels of detail , L L M s .
Once all the parameters ( ) , , l L C are chosen, a local search algorithm finds the
distribution of the total channel rate among all the packets, in other words the
allocation of C bits over the L transmitted batches, i.e.
( ) ( ) ( ) 1 2
, ...
L
L
C C C C
(
=

. It
is important to observe that the approach of [Al-Regib 2005a] performs actually an
independent source and channel coding, implying an iterative optimization of the
source and channel coders, and not a joint optimization of them. Furthermore, an
exhaustive search for the optimum solution limits the applicability of [Al-Regib
2005a] in real-time application scenarios, requiring on-the-fly adaptation to rapidly-
varying channel conditions, including variable bandwidth and packet-loss rates. In
practical systems, it is therefore of vital importance to employ fast algorithms for an
optimized allocation of the error protection levels for each layer.
Li et. al. [Li 2006] regard the problem of lossy transmission of meshes from the
perspective of the error-prone channel, and propose as solution a network-based
error control scheme. Their idea is to build a middleware layer, between the
application and the network, which organizes and transmits the 3D data based on the
content and network conditions, such that the delay and reconstruction distortion at
the receiver side are minim, without employing any FEC or concealment techniques.
Three progressive compression techniques are supported, namely Compressed
Progressive Meshes [Pajarola 2000], Progressive Forest Split [Taubin 1998a], and
Valence-Driven Conquest [Alliez 2001]. The first step is to parse the progressively
compressed 3D data and structure it in two categories: critical data, essential to
reconstruct the mesh, and refinement data respectively, which is dispensable for the
96 Chapter 4

decoder to function, but necessary for improving the accuracy of the reconstruction.
The critical data is sent to the decoder over a reliable channel, e.g. TCP, which
copes with packet losses by resending them. The refinement data is organized in
packets, which are sorted by their contribution to the quality of the reconstruction.
Based on their importance and on the network conditions, e.g. packet loss rate,
bandwidth and delay, these packets are transmitted over either reliable or unreliable
channels. At the receiver, the decoder collects all received packets and reconstructs
the mesh within the limits of available data. Despite of the interesting results, this
technique is more a complementary solution to an FEC-based error protection,
which is still needed in order to (i) improve the accuracy of the reconstructed
geometry by recovering more data packets, and to (ii) reduce in the same time the
delay by eliminating the need of retransmitting the packets.
Later developments in the area of JSCC of meshes include the work of Tian et al.
[Tian 2007a]. They propose an error protection system designed for transmission of
3D scenes, scenes consisting of multiple independent meshes. First, the plurality of
3D objects, contained by the scene to be transmitted, is weighted and sorted based
on some view-independent criteria, like relative volumes, geometric complexity, and
application semantics. Each mesh is then decomposed in a base mesh and multiple
levels of detail. The base mesh is encoded using single-resolution compression
methods such as those in [Taubin 1998b, Touma 1998]. To code the enhancement
data, a spatial progressive technique is used, based on the Vector Quantization (VQ)
method [Chou 2002] able to code jointly the vertices and the geometry. The mesh
batches generated in this way are then weighted and sorted based on the relative
improvement in quality to the full scene. The base meshes are transmitted over a
reliable channel, like TCP, and assumed to be integrally received by the decoder.
Next the refinement data is protected using FEC codes such as Reed-Solomon codes
[Rizzo 1997] in order to be transmitted over the error prone channel. The rate
allocation between the source and the channel is done iteratively in two major steps.
As the first step, a set of FEC codes is computed for each source-channel rate
scenario. In the remaining step, a steepest decent search algorithm is performed,
which finds the proper rate distribution between source and channel, under the given
total rate constraint, based on the earlier computed weights. Once the rate allocation
is performed, the packets are interleaved and transmitted over the error-prone
channel. The proposed scheme is empirically proven to be efficient, though the rate
allocation is performed separately for the source and the channel and therefore
optimality can not be claimed.
The joint-source and channel coding algorithm proposed in this chapter follows a
joint optimization approach, in that, in contrast to the state-of-the-art technique of

[Al-Regib 2005a], the number of layers and the code rates for each layer are
simultaneously determined subject to a total bit budget. In our design, an unequal
error protection approach [Albanese 1996] is followed to account for the different
error-sensitivity levels characterizing the various resolution and quality layers. The
optimized rate-allocation is found by solving a JSCC problem, wherein the
estimated distortion is minimized subject to a total rate constraint. In this chapter we
propose a novel fast algorithm for solving the constrained-optimization problem,
whose complexity is lower than that of similar algorithms [Banister 2002] existing
in the literature. In contrast to the use of an exhaustive-search for the optimum
solution, the proposed fast optimization algorithm enables a real-time
implementation of the JSCC rate-allocation. The algorithm is applicable to any
scalable mesh codec and is instantiated in this chapter for the specific case of
MESHGRID [Salomie 2005, Salomie 2004b]. Furthermore, in contrast to other JSCC
methods existing in the literature, in our approach the JSCC problem is formulated
and solved for both the L-infinite and the classical L-2 distortion metrics. In terms of
performance, it is found that similar to the error-free case [Cernea 2005] the L-
infinite norm is a better option, particularly in low-rate coding of surfaces.
4.3 SCALABLE JOINT SOURCE AND CHANNEL
CODING OF MESHES
Let us consider the generic situation wherein the input 3D object is decomposed
into L different sources of information, each of these sources being progressively
encoded [ISO/IEC 2004]. The sources can be independent regions of interest, if we
consider a spatial-partitioning approach [Park 2006, 2003], or wavelet subbands, if
we consider a wavelet-based source coding approach, as in case of MESHGRID. In an
error-prone transmission scenario, the JSCC problem that needs to be solved is to
determine the amount of source information and the optimum protection levels to be
employed on each source such that the estimated distortion at the decoder site is
minimized subject to a constraint on the total source and channel rate. One can also
pose the alternative problem, wherein we seek to optimize the rate allocation such
that the total rate is minimized subject to a bound on the estimated average
distortion at the decoder site.
In order to formulate the JSCC problem, let us denote by
tot
D the spatial-domain
distortion in the reconstructed mesh, and by
l
D the contribution in the total
distortion of a given source , 1 l l L s s . In an error-free transmission scenario, the
spatial-domain distortion
tot
D is a linear combination of distortions
,
( )
l s l
D R on
source l , of the form:
98 Chapter 4

,
1
( )
L
tot l l s l
l
D q D R
=
=
(4.1)
where
, s l
R is the source-rate associated with source l , while the
l
q s are weighting
the different distortion contributions in the total distortion. This additive distortion
metric model is generic. Indeed, in spatial-partitioning mesh-coding approaches,
such as [Park 2006, 2003], the sources are regions of interest, hence 1,
l
q l = , and
the sources are independent. In wavelet-domain approaches, such as MESHGRID,
each source is a subband progressively encoded in a bitplane-by-bitplane manner,
,
( )
l s l
D R is the source-distortion-rate associated with subband l , and the weights
l
q
depend only on the distortion-metric type and wavelet filter-bank employed.
In the case of an L-2 distortion metric, expression (4.1) for
tot
D is well-known,
and the
l
q factors depend on the gains of the wavelet filters, as already discussed in
section 3.5.2.
In the L-infinite case,
tot
D is the MAXAD. As already shown in section 3.4 and
discussed in the literature [Alecu 2004, Alecu 2006, Alecu 2003b], the MAXAD can
be expressed as a linear combination of distortions occurring in the wavelet-domain,
which are produced by scalar quantization of the wavelet subbands. Instantiations of
(4.1) for images for a few common biorthogonal filter-banks are given in [Alecu
2003b]. In Chapter 3, we have extended these findings for meshes, by expressing the
MAXAD as a function of wavelet-domain distortions produced by a scalable coding
of the mesh. For 3D objects, the instantiation corresponding to the specific wavelet
filter-bank employed in MESHGRID is given in section 3.5.4.
It is important to observe that in the case of scalable wavelet-based L-infinite-
oriented coding, expression (4.1) is valid only for rates
, s l
R associated to a complete
encoding of the corresponding bitplane in subband l (see Chapter 3). In other
words, in the L-infinite case it is impossible to express
tot
D for fractional bitplanes.
For this reason, in a progressive transmission scheme where the MAXAD is the
target distortion, a subband bitplane should be either completely received or
completely missing. Consequently, in the L-infinite case, an error-resilient
MESHGRID coding system should protect the source packets from a given subband
bitplane by using the same coding rate. In addition, losing a packet from a subband
bitplane is equivalent to losing the entire bitplane.
In the following we proceed to the error-prone case and consider transmission
over a packet-loss channel with total capacity
tot
R . The proposed JSCC approach
assumes that an interleaver is used in the transmission scheme. In this way, a packet-
erasure channel commonly used to model modern packet-based networks translates
into a binary-erasure channel (BEC) model used in our simulations.
The JSCC-algorithm has to allocate the total rate
tot
R across all L different

scalable sources and between the source and channel coders in such a way that the
overall estimated distortion is minimized subject to a rate constraint. In order to
solve this problem, a recursive formulation for the average expected distortion is
presented. Thereafter, we show that the JSCC problem can be solved via a
Lagrangian optimization technique and propose a novel fast algorithm to find a
near-optimum solution.
4.3.1 JSCC Formulations
We define the code rate r of the error correction codes as / r k N , where k is
the number of source bits and N is the total number of bits in the codeword, and
denote by ( , )
f
p r c the probability of losing a codeword that is transmitted over a
BEC with parameter c . The supported transmission scenarios are generic, in the
sense that either k is pre-defined (but not necessarily constant) and N is variable,
corresponding to a fixed-k transmission mode, or N is pre-defined (and in general
constant) and k is variable, corresponding to a fixed-N transmission mode. We
notice that a fixed-N transmission mode incurs simplifications of the involved cost
functions and the corresponding JSCC algorithm. This mode has been thoroughly
treated in JSCC for images and video by M. Stoufs in [Stoufs 2008].
Assume that each source l is encoded in a scalable manner using a total number
of layers
, l tot
M . The JSCC problem requires determining (i) the number of layers
,
,
l l l tot
M M M s that need to be protected and transmitted for each source, and (ii)
the protection levels for each layer, expressed by the code rates
, l i
r used in
codeword , 0
l
i i M < s of source l . The average distortion
,0 ,1 ,
( , ,..., )
l
l l l l M
D r r r at
the decoder site for source l is a function of the code rates
, l i
r , with
,0
0
l
r = by
convention. This distortion can be written as:

,0 , , , 1 ,
0 0
( ,..., ) (1 ( , )) ( , )
l
l
M m
l l l M f l i f l m l m
m i
D r r p r p r D c c
+
= =
(
=
(

[
. (4.2)
We now denote
, ,
0
(1 ( , ))
m
l m l i
i
p r o c
=

[
. From (4.2), together with the
conventions
,0
1
l
o = ,
,0
0
l
r = and
, 1
( ) 1
l
l M
p r
+
= , we can derive the following
recursive formula:

( )
,0 , ,0 , 1 , , 1 ,
( ,..., ) ( ,..., )
l l l l l
l l l M l l l M l M l M l M
D r r D r r D D o

= . (4.3)
The code rates
( )
,1 ,
,...,
l
l l M
r r assigned to the
l
M codewords of source l have to
be chosen such that a minimal end-to-end distortion is achieved. Let
, , 1 , l m l m l m
D D D
A , 0
l
m M < s denote the decrease in distortion resulting from
successfully decoding codeword m. We name the set of code rates
( )
,0 ,1 ,
, ,...,
l l l m
r r r
as the path
, l m
H , i.e.
( )
, ,0 ,1 ,
, ,...,
l m l l l m
r r r H = , with
,
1
l tot
m M s s . The average
expected distortion
l
D when taking path
,
l
l M
H is thus given by:
100 Chapter 4

, , 1 , 1 , ,
( ) ( ) (1 ( , ) )
l l l l l
l l M l l M l M f l M l M
D D p r D o c

H = H A . (4.4)
Writing expression (4.4) for all , 0
l
m m M < s , and accounting for the fact that
0
( ) (0)
l l
D D H = leads to:

, , ,
1
( ) (0)
l
l
M
l l M l l m l m
m
D D D o
=
H = A
. (4.5)
Equation (4.5) practically says that if a codeword m is successfully received and
decoded, then the average distortion in source l decreases with
, , l m l m
D o A . From
(4.1) and (4.5), we deduce that the total distortion is of the form:

, , ,
1 1 1 1
( ) (0)
l
l
M L L L
tot l l l M l l l l m l m
l l l m
D q D D q q D o
= = = =
= H = A

. (4.6)
This total distortion needs to be minimized subject to the rate constraint, or,
equivalently, to a constraint on the total length (in bytes):
target tot
N N s . The
constrained minimization problem that needs to be solved is thus:

, , target
1 1
( ) ( )
l l
L L
tot l l l M tot l l M
l l
minimize D q D subject to N N N
= =
= H = H s

(4.7)
This constrained minimization problem can be transformed into an unconstrained
minimization problem wherein we seek to minimize the functional:

( ) , ,
1
( ) ( )
l l
L
tot tot l l l M l l M
l
J D N q D N
A
=
= + = H + H
, (4.8)
with 0 > . Denote by
( )
, , , ,
( ) ( )
l m l m l l l m l l m
J q D N H = H + H , for 0
l
m M < s and
,0
(0)
l l l
J q D = . From (4.4) it can be shown that
, l m
J satisfies the recursion:

( ) ( )
,
, , , 1 , 1 , ,
,
l m
l m l m l m l m l l m l m
l m
k
J J q D
r
o

H = H A + (4.9)
for all , 1
l
m m M s s . The functional to be minimized is thus:

, ,
1
( )
l l
L
l M l M
l
J J
=
= H
. (4.10)
In order to minimize (4.10), we follow a Lagrangian-optimization approach and
determine (a) the optimum number of most significant layers
l
M that need to be
protected and sent for each source, as well as (b) the optimum set of code-rates
(paths)
,
l
l M
H for each source. This approach will be detailed in the next section.
In the end of this section, it is important to observe that if the objective function
( )
tot tot
D N is convex, then the necessary and sufficient Karush-Kuhn-Tucker
conditions [Kuhn 1951] establish that the solution to the unconstrained minimization
of the functional in (4.10) is a global minimum for problem (4.7). In general, one
cannot claim that for all possible paths
,
l
l M
H , the ensuing ( )
tot tot
D N is convex.
That is, in general, one cannot claim a global optimality of the solution to (4.10).
Still, ( )
tot tot
D N can be made convex iff J expressed by (4.10) is convex, which is

ensured if
, ,
( )
l l
l M l M
J H is convex for every , 1 l l L s s . In our approach, this is
ensured by retaining from the computed candidate paths only those paths
,
l
l M
H for
which
, ,
( )
l l
l M l M
J H is convex, as explained next. The end result of this limitation is
that, conditioned on the considered paths, the solution to (4.10) is optimal. Since not
all possible paths are considered, the resulting solution is not, in general, a global
optimum to the constrained optimization problem (4.7).
4.3.2 Optimized Rate-Allocation
Let us suppose that we find a solution set (
*
* *
,
,
l
l
l M
M H ) that minimizes J for
some 0 > . This solution set is necessarily optimal in the sense that the distortion
tot
D cannot be further reduced without increasing the length
tot
N , or vice-versa.
Thus, if we find a value of such that the corresponding set (
*
* *
,
,
l
l
l M
M H )
minimizes (4.10) and in the same time satisfies the length constraint
target tot
N N s ,
then this must be the solution to our constrained optimization problem.
It is clear from the assumed additive property of the distortion that minimizing J
for a given is equivalent to minimizing every
,
l
l M
J for that . In order to find
*
* *
,
( , )
l
l
l M
M H =
,
, ,
,
argmin ( )
l l
l l M
l
l M l M
M
J
H
H for every l , we proceed recursively, as
suggested by the recursive formula (4.9).
Denote by d the total number of available protection levels (or code-rates), and
the set of possible code-rates. At the first step, ( )
,0
0
l l l
J q D = , which is the
starting point in the minimization algorithm. Next, protecting the first layer in
source l with a code-rate
,1 l
r e leads to:
( ) ( ) ( )
,1
,1 ,1 ,1
,1
0 1 ,
l
l l l l f l l
l
k
J q D q p r D
r
c = A + . (4.11)
We initialize d paths, denoted as
,1 l
H , with
,1 ,1 l l
r H = , for every
,1 l
r e . Since
there are d possibilities to protect the most significant layer, there will be d cost
values
( )
,1 ,1 l l
J H along these paths. Out of these, we can find
,1
,1
*
,1 ,1
argmin( ( ))
l
l
l l
J
H
H = H , and store all the values of
,1 l
J for further use.
Recursively, at the next step m , we calculate
( )
, , 1 ,
,
l m l m l m
J r
H for every
, l m
r e based on the previously calculated
( )
, 1 , 1 l m l m
J

H values, using (4.9). Out
of these, we determine:

,
, 1
, , 1 , ,
(argmin( ( , )), )
l m
l m
l m l m l m l m
J r r

H
H = H , for every
, l m
r e . (4.12)
We note that the operation ( , ) C A B = above indicates the concatenation of vector
A with scalar B to obtain a higher-dimension vector C . Out of the retained d
paths
, l m
H , we determine the minimum path
,
*
l m
H at step m as that path for which
,
*
,
( )
l m
l m
J H is convex and:
102 Chapter 4

,
,
*
, ,
argmin( ( ))
l m
l m
l m l m
J
H
H = H . (4.13)
The procedure is repeated recursively for all
,
,
l tot
m m M s . This procedure shows
that for every m, we have a different path
,
*
l m
H that minimizes
, l m
J , as illustrated
in Figure 4-2. The figure gives a pictorial representation of the computed paths and
the minimum ones
,
*
l m
H determined by the algorithm at every recursion. As shown
in the figure, in practice, some of the minimum paths
,
*
l p
H and
,
*
l q
H with p q =
might be completely non-overlapping while others might be partially or completely
overlapping.

r
1

r
2

r
3

r
4

1 2 3 4 m
Code Rates
0
*
,1 l
H
*
,2 l
H
*
,3 l
H
*
,4 l
H

Figure 4-2: Construction of paths
,
*
l m
H for a set
1 2 3 4
{ , , , } r r r r = (total number of
available code-rates is 4 d = ) and m codewords, with 1 4 m s s . The figure
illustrates the computed paths at every recursion (in dashed lines) and the
minimum paths
,
*
l m
H (in solid lines).
The solution set
*
* *
,
( , )
l
l
l M
M H is determined by identifying that number of layers
for which
*
, ,
( )
l m l m
J H is minimal, that is:
* *
, ,
argmin( ( ))
l
l m l m
m
M J = H .
For any given value of , the JSCC solution
*
* *
1
,
( , )
l
l l L
l M
M
s s
H is determined by
repeating the above algorithm for all sources. Similar to the optimization approach
employed in the case of images (e.g. JPEG2000 [Taubman 2002]), the optimum
value of is taken as the minimum value for which the rate constraint is still
satisfied:

*
*
target
,
1
min | ( )
l
L
opt
l
l M
l
N N
=

= H s
`
)
. (4.14)
The search for
opt
can be performed using the classical bisection method, for
instance, wherein a working interval
min max
( , ) e is successively halved, until a
stopping criterion on the size of the interval is met [Taubman 2002].
It is clear that, at the level of a source, not all possible paths are considered by the
proposed algorithm. That is, even if the J expressed by (4.10) is convex, not all

possible paths
,
l
l M
H have been considered when solving the unconstrained
optimization problem (4.10). In other words, a global optimality of the solution to
the constrained optimization problem (4.7) cannot be claimed. On the other hand,
limiting the number of possible candidate paths is of crucial importance, as an
exhaustive search for the global optimum is computationally intractable in a
practical application. Indeed, if we express the total complexity of the algorithm in
terms of the total number of paths that would need to be computed per source, then,
in case of an exhaustive search the complexity is of order ( )
M
O d , where M is the
total number of layers. The complexity of the proposed algorithm decreases
significantly to
2
( ) O d M . Moreover, if compared to the algorithm proposed by
Banister in [Banister 2002], which has a complexity of the order
2
( ) O d M , our
algorithm provides a significant reduction in complexity, since M is typically much
larger than d .
In the following section, we concentrate on LDPC codes, which are the FEC
codes employed in this chapter.
4.3.3 Low-Density Parity-Check Codes
An LDPC code is a linear block code [Ryan 2003]. In general, linear block codes
are described in terms of matrices: a generator matrix G of dimension k N and its
dual parity-check matrix H of dimension ( ) N k N , where k is the number of
source bits and N is the total number of bits in the codeword. The generator matrix
G represents a set of basis vectors within a k -dimensional subfield of field
N

such that any codeword is a linear combination of the rows of G . So, the generator
matrix G defines the mapping of a source word to a codeword. Each row of the
parity-check matrix H defines a linear constraint satisfied by all codewords. Also,
*
0 HG = such that H can be used to detect errors in the received word eventually
corrupted by a noise.
The main characteristics of an LDPC code are that (i) the parity-check matrix H
is sparse, i.e. a matrix with a low number of ones and a large number of zeros, and
(ii) the decoding is performed iteratively using a so-called message-passing
algorithm.
The iterative decoding process is easily explained using a Tanner graph [Tanner
1981] representation of the parity-check matrix. A Tanner graph consists of variable
nodes and N k check nodes. Connections between the two sorts of nodes are
realized according to the position of the ones in the matrix and are called edges. A
regular LDPC code is characterized by a low and fixed number of ones in the
columns (also called left or variable degree) and a low and fixed number of ones in
the rows (also called right or check degree). LDPC codes with a variable amount of
104 Chapter 4

ones in the rows and columns are called irregular LDPC codes. For extensive
information on the iterative decoding process see [Ryan 2003].
Different ways to design good LDPC codes with prescribed properties have been
proposed in literature [Gallager 1963, Hu 2005, Kou 2001, Luby 2001, Lucas 2000,
MacKay 1999, Richardson 2001]. Good LDPC codes are typically achieved when
the girth of the LDPC matrix is maximized [Hu 2005]. The girth is the smallest loop
or cycle that can be found in a Tanner graph. A simple and efficient method to
construct such LDPC codes, is the pseudo-random Progressive Edge-Growth (PEG)
construction method proposed by Hu et al. [Hu 2005]. This method is also used in
this dissertation. Using PEG, edges (connections in the Tanner graph) are assigned
one at a time. For each variable node from 1 to N , the first edge is randomly
assigned to a check node among those of lowest degree, while the other edges are
assigned to check nodes which are not among the neighbors of the variable node up
to depth-l in the current graph. It is noted that this construction method can be used
both for regular and irregular LDPC codes and results in good short-length LDPC
codes as long as the Tanner Graph is optimized [Hu 2005].
In this section, an instantiation of the proposed scalable JSCC approach is
demonstrated by using MESHGRID [Salomie 2005, Salomie 2004a] as the input
scalable source coding technique. For channel coding, we employ punctured regular
(3,6)-LDPC codes [Lin 2004] for which we measured the statistical performance
off-line (see Table 4-1). The protection levels can be chosen from a set of five
LDPC codes of progressive strength. The relatively small amount of data consumed
by the header-information and connectivity-wireframe is protected by using the
strongest LDPC codes. Hence, the rate constraint is used only for the reference-grid
data, which is protected in an optimized manner using the proposed JSCC approach.
In this case, the L sources of information in equation (4.1) refer to the reference-grid
only, each source being a wavelet subband that has been progressively encoded in a
bitplane-by-bitplane manner.

Table 4-1: Average probability of packet loss for the UEP punctured regular (3,6)
LDPC-codes when transmitted over BECs with c = 5%, 10%, 20% and 30%
erasures.
Code
Number
Code
Rate
Probability
of Failure
1 0.809 0.00E+00
2 0.816 1.00E-06
3 0.824 2.83E-04
4 0.832 3.24E-02
5 0.840 1.84E-01
BEC with 5% erasures

Code
Number
Code
Rate
Probability
of Failure
1 0.758 0.00E+00
2 0.773 4.84E-05
3 0.781 1.21E-03
4 0.789 3.63E-02
5 0.797 1.49E-01

Code
Number
Code
Rate
Probability
of Failure
1 0.652 0.00E+00
2 0.668 4.57E-05
3 0.676 4.12E-04
4 0.684 7.30E-03
5 0.695 1.22E-01

Code
Number
Code
Rate
Probability
of Failure
1 0.555 0.00E+00
2 0.570 3.12E-05
3 0.578 6.19E-04
4 0.586 6.33E-03
5 0.594 4.04E-02

In our approach, the JSCC problem is formulated and solved for both the L-
infinite and the classical L-2 distortion metrics. In the L-2 case,
, l m
D A used in (4.9)
represents the decrease in distortion in between two successive truncation points,
which can be estimated similar to the solution adopted within JPEG-2000 [Taubman
2002, Verdicchio 2006]. In the L-infinite case, the eligible truncation points are only
the end of the bitplanes [Alecu 2006, Alecu 2003b]. Since the embedded quantizers
employed by MESHGRID are the classical successive approximation quantizers
[Salomie 2005, Salomie 2004a], the distortion
, l m
D in (4.2) is the MAXAD
occurring at bitplane m, and this is induced by the quantizer deadzone [Alecu 2004,
Alecu 2006]. That is,
,
2
l
M m
l m
D

= , implying that in (4.9) one uses
,
2
l
M m
l m
D

A = .
4.4.1 UEP Performance Overview
The first set of experiments is intended to emphasize the importance of using
error protection in scalable coding and transmission of 3D models over error-prone
channels. Additionally, the experiments assess the execution speed of the proposed
rate-allocation algorithm and demonstrate the practical applicability of the proposed
approach in real-time applications.
Figure 4-3 shows the original Heart model, which is coded and transmitted with
and without error protection over a BEC with 10% bit erasures, at the same target
bit-rate. The first image, from left to right, illustrates the original 3D model, the
106 Chapter 4

Figure 4-3: The Heart model (from left to right) decoded at (1) full resolution in an
error-free case, and after being transmitted over a BEC with 10% bit erasures at
18kB using (2) the proposed UEP approach, and (3) the standard MESHGRID codec
(NEP).
second image represents the decoded mesh at the client side if the bitstream is
protected against errors using the proposed JSCC approach, while the third image
represents the decoded mesh if No Error Protection (NEP) is performed (i.e. the
standard MESHGRID codec is used). These results demonstrate that the proposed
JSCC approach is capable of sustaining 10% bit erasures without any visual
artefacts, while the standard MESHGRID codec is significantly affected. While these
experiments demonstrate the benefits brought by the error-resilient coding, we note
that the NEP scheme performs better if the channel is error free, which is due to the
redundant information added by any error-protection scheme.
Additionally, we measured the performance of the proposed JSCC method (in
terms of MAXAD and execution time versus bit erasure rate), and compared it
against that of the standard MESHGRID codec. Our setup used to make the time
measurements is a PC operating Windows XP SP3 with an Intel Core 2 Duo
processor at 2.40 MHz and 2 GB of RAM. The results shown in Figure 4-4 and
Figure 4-5 demonstrate that for increasing bit erasure rates the MAXAD differences
are very large. At the same time, even if the execution time of the JSCC version is
almost double compared to that of the standard MESHGRID codec (see Figure 4-5), it
still stays in the range of tens of milliseconds, for an object of moderate complexity,
containing 15950 vertices and 42312 triangles.
Based on these experiments, one concludes that (i) providing error-resilience is of
paramount importance in scalable mesh coding and transmission over error-prone
channels, and (ii) real-time implementations for the proposed JSCC approach are
easy to achieve, even on much less powerful devices, such as portable devices.

0
10
20
30
40
50
60
5%
0.63kB
5%
3.28kB
5%
11.43kB
10%
0.62kB
10%
3.20kB
10%
11.30kB
20%
0.61kB
20%
3.34kB
20%
11.40kB
T
i
m
e

(
m
s
)
NEP UEP
0%
5%
10%
15%
20%
25%
5%
0.63kB
5%
3.28kB
5%
11.43kB
10%
0.62kB
10%
3.20kB
10%
11.30kB
20%
0.61kB
20%
3.34kB
20%
11.40kB
M
A
X
A
D
NEP UEP

Figure 4-4: Performance of the proposed JSCC algorithm (UEP) compared to that
of the standard MESHGRID codec (NEP). The graphs depict the results obtained on
the Heart model. The MAXAD is reported in %, expressing the maximum variation
of the vertex-positions relative to the diagonal of the bounding box containing the
object.
4.4.2 UEP vs. Equal Error Protection
In a second set of experiments, the proposed UEP approach is compared against
an Equal Error Protection (EEP) method. In the EEP case, the source layers have
been equally protected using the strongest possible FECs (from the available UEP
ones) for the considered bit-erasure probability. We have used two MESHGRID
108 Chapter 4

0
10
20
30
40
50
60
5%
0.58kB
5%
4.90kB
5%
18.50kB
10%
0.57kB
10%
4.80kB
10%
18.45kB
20%
0.57kB
20%
4.77kB
20%
18.70kB
T
i
m
e

(
m
s
)
NEP UEP
0%
1%
2%
3%
4%
5%
6%
7%
8%
5%
0.58kB
5%
4.90kB
5%
18.50kB
10%
0.57kB
10%
4.80kB
10%
18.45kB
20%
0.57kB
20%
4.77kB
20%
18.70kB
M
A
X
A
D
NEP UEP

Figure 4-5: Performance of the proposed JSCC algorithm (UEP) compared to that
of the standard MESHGRID codec (NEP). The graphs depict the results obtained on
the Humanoid model. The MAXAD is reported in %, expressing the maximum
variation of the vertex-positions relative to the diagonal of the bounding box
containing the object.
models, for which scalable coding and error-protection are applied. The protected
streams are transmitted over BECs with 5%, 10%, 20% and 30% of bit erasures.
Both UEP and EEP are compared at the same set of target bit-rates. The simulated
transmission of the models over the error-prone channel is repeated 1000 times for
each setup, and the average results, in terms of distortion and execution time are

determined (see Table 4-2, Figure 4-6 and Figure 4-7). The results show that the
proposed JSCC UEP-based solution (i) yields superior performance compared to
EEP, and (ii) requires a negligible execution time, making it suitable for real-time
applications.
0.00%
1.00%
2.00%
3.00%
4.00%
5.00%
6.00%
5%
0.63kB
5%
3.28kB
5%
11.43kB
10%
0.62kB
10%
3.20kB
10%
11.30kB
20%
0.61kB
20%
3.34kB
20%
11.40kB
30%
0.64kB
30%
3.14kB
30%
11.75kB
M
A
X
A
D
EEP UEP[Al-Regib] UEP
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
5%
0.63kB
5%
3.28kB
5%
11.43kB
10%
0.62kB
10%
3.20kB
10%
11.30kB
20%
0.61kB
20%
3.34kB
20%
11.40kB
30%
0.64kB
30%
3.14kB
30%
11.75kB
T
i
m
e

(
m
s
)
EEP UEP UEP[Al-Regib]

Figure 4-6: Performance of the proposed JSCC algorithm (UEP) compared to EEP
and the state-of-the-art (UEP[Al-Regib] [Al-Regib 2005a]). The graphs depict the
results for the Heart model. The MAXAD is reported in %, expressing the
maximum variation of the vertex-positions relative to the diagonal of the bounding
box containing the object.
We note that, due to a limited database of MESHGRID models available at the time
of the experiments, a limited number of meshes have been used to obtain these
results. However, the employed models cover a large diversity of data patterns,
allowing us to extend the degree of confidence for these results closely towards a
generic conclusion. The Humanoid model is composed of three levels of resolution
and employs a highly irregular reference-grid, while the Heart model has five levels
110 Chapter 4

of resolution and a smooth reference-grid. Nevertheless, a wide range of models
have been made recently available for MESHGRID, hence, we intend to conduct
additional experiments in our prospective work.
0.00%
1.00%
2.00%
3.00%
4.00%
5.00%
6.00%
5%
0.58kB
5%
4.90kB
5%
18.50kB
10%
0.57kB
10%
4.80kB
10%
18.45kB
20%
0.57kB
20%
4.77kB
20%
18.70kB
30%
0.63kB
30%
5.38kB
30%
18.33kB
M
A
X
A
D
0
2000
4000
6000
8000
10000
12000
14000
5%
0.58kB
5%
4.90kB
5%
18.50kB
10%
0.57kB
10%
4.80kB
10%
18.45kB
20%
0.57kB
20%
4.77kB
20%
18.70kB
30%
0.63kB
30%
5.38kB
30%
18.33kB
T
i
m
e

(
m
s
)
EEP UEP UEP[Al-Regib]

Figure 4-7: Performance of the proposed JSCC algorithm (UEP) compared to EEP
and the state-of-the-art (UEP[Al-Regib] [Al-Regib 2005a]). The graphs depict the
results for the Humanoid model. The MAXAD is reported in %, expressing the
maximum variation of the vertex-positions relative to the diagonal of the bounding
box containing the object.

4.4.3 UEP vs. State of the Art
We have also compared the proposed JSCC approach against the state-of-the-art
JSCC algorithm described in [Al-Regib 2005a], when both are making use of
MESHGRID as scalable source coding technique. In conceptual terms, the approach
of [Al-Regib 2005a] scans all the possible total channel rates ,
p
C pQ =

0 p B Q s s
(

, where B is the available bandwidth,
(

is the integer part, p is
integer, and Q is the rate-step in bits. For each p , the algorithm of [Al-Regib
2005a] determines (a) an optimal source rate allocation for the corresponding source
rate
p p
S B C = , and (b) an optimized distribution of the protection levels to be
employed for the source layers determined at step (a), given the total channel rate
p
C .
Table 4-2: Experimental results: comparison between EEP, the proposed UEP
approach and state-of-the-art UEP[Al-Regib] [Al-Regib 2005a] for different rates
(N
target
) and bit erasure rates (5%, 10%, 20%, 30%)
Ntarget(kB) MAXAD% Time(ms) MAXAD% Time(ms) MAXAD% Time(ms)
0.63 4.26% 16 4.26% 1103 4.20% 34
3.28 1.90% 20 1.90% 4755 1.88% 47
11.43 0.36% 23 0.37% 9040 0.36% 55
0.62 5.29% 16 5.29% 671 4.23% 29
3.20 2.00% 21 2.00% 2981 1.90% 40
11.30 0.47% 24 0.38% 7218 0.38% 47
0.61 5.30% 16 5.30% 827 5.30% 28
3.34 2.34% 20 2.34% 3206 2.00% 39
11.40 0.68% 23 0.68% 6957 0.47% 47
0.64 5.61% 11 5.61% 635 5.32% 21
3.14 2.43% 13 2.43% 2401 2.31% 29
11.75 0.71% 15 0.71% 6415 0.69% 33
Ntarget(kB) MAXAD% Time(ms) MAXAD% Time(ms) MAXAD% Time(ms)
0.58 3.69% 12 3.69% 553 3.60% 23
4.90 0.81% 18 0.66% 4626 0.66% 39
18.50 0.05% 22 0.05% 12024 0.04% 48
0.57 3.97% 12 3.70% 489 3.70% 20
4.80 0.81% 17 0.81% 3633 0.81% 35
18.45 0.07% 21 0.07% 9925 0.07% 41
0.57 3.96% 12 3.98% 452 3.98% 19
4.77 0.99% 17 0.90% 3587 0.90% 33
18.70 0.14% 21 0.10% 10185 0.09% 41
0.63 4.87% 10 4.87% 469 3.98% 17
5.38 0.99% 12 0.90% 4120 0.89% 25
18.33 0.18% 14 0.18% 9885 0.18% 30
5%
10%
20%
30%
Humanoid
Heart
EEP UEP[Al-Regib]
5%
10%
20%
30%
UEP

In our experiments, the value of the step parameter Q was set to 1000 bits, as in
[Al-Regib 2005a], except for the experiments with a very low bit budget, where Q
was decreased for a better precision of the algorithm. Apart of this, the same
operational settings as in the previous set of experiments have been used. The results
are reported in Table 4-2. We notice that the two algorithms provide comparable
distortions, but the difference in execution time is extreme. The reason for the
dramatic gap in execution time is the iterative nature of [Al-Regib 2005a]. Basically,
in [Al-Regib 2005a] the source and channel rates are not jointly optimized, each step
, 0 p p B Q s s
(

corresponding to a certain distribution of the total bandwidth
112 Chapter 4

among the source and channel codecs. The step-size in rate Q has to be relatively
small, in order to produce an accurate rate allocation. In our settings this is indeed
the case, as reflected by the comparable distortion figures produced by the two
algorithms. Increasing Q reduces the number of iterations (hence the execution
time), but this also reduces the accuracy, by significantly worsening the obtained
results. For example, for the Heart model setup, at 10% bit erasures and 3.20kB
target rate, if Q is increased to 4000 bits, the execution time drops to 647 ms, but
the distortion produced by [Al-Regib 2005a] increases also to 2.34%.
We conclude that the numerical comparisons in terms of distortion and the huge
differences in execution time clearly favour the proposed JSCC approach against the
state-of-the-art of [Al-Regib 2005a].
We investigated the performance differences between the proposed JSCC
employing the fast rate-allocation algorithm against an exhaustive-search technique
finding optimum protection levels. Although optimality for the proposed JSCC
cannot be claimed, experimental results obtained on two models (Heart and
Humanoid), using five protection levels, for 10% and 30% bit erasures at three
different target rates, demonstrate that there are no notable performance differences
between the proposed JSCC approach and the exhaustive-search technique.
4.4.4 Graceful Degradation
In a fourth set of experiments, we demonstrate the graceful degradation of the
proposed UEP approach, by comparing it against EEP, both schemes operating
under the same channel conditions (channel capacity and bit erasure rate). For these
experiments, three MESHGRID objects have been used, i.e. the Feline model at
172.65kB target rate, shown in Figure 4-8 (a), the Mars Surface model at 159.99kB
target rate, shown in Figure 4-8 (b), and the Swiss Landscape model at 34.75kB
target rate, shown in Figure 4-8 (c). For the first series of experiments, the
bitstreams are protected assuming 20% bit erasures and are transmitted over BEC
channels with different actual error-rates, ranging from 17% (implying
overprotection) to 25% (implying under-protection). For our particular LDPC codes
employed, 17% was the most overprotected scenario worth to be tested: below 17%
all the codewords are guaranteed to be correctly decoded on the client side. Above
25%, most codewords are lost and the decoded meshes are significantly distorted.
Similarly, a second set of experiment is done for error protections assuming 30% bit
erasures while transmitted over BEC channels with actual error-rates ranging from
28% (implying overprotection) to 35% (implying under-protection). The results are
summarized in Table 4-3.

(a)
Figure 4-8 (part 1 of 2): Graceful degradation of the MESHGRID mesh (a) Feline
model.
114 Chapter 4

(b)
(c)
Figure 4-8 (part 2 of 2): Graceful degradation of the MESHGRID meshes: (b) Mars
Surface model, and (c) Swiss Landscape model.

Table 4-3: EEP versus proposed UEP approach for channel mismatches; 20% and
30% BEC are assumed, while the actual bit erasure rate is in the range 18%-25%
and 28%-35% respectively. The results are for the meshes: Feline (172.65kB), Mars
Surface (159.99kB) and Swiss Landscape (34.75kB).
18% 19% 20% 21% 22% 23% 24% 25%
EEP 1.08% 1.08% 1.08% 1.08% 1.09% 1.19% 1.59% 1.90%
UEP 1.08% 1.08% 1.08% 1.08% 1.10% 1.19% 1.46% 1.84%
28% 29% 30% 31% 32% 33% 34% 35%
EEP 1.29% 1.29% 1.29% 1.29% 1.29% 1.32% 1.52% 1.84%
UEP 1.29% 1.29% 1.29% 1.29% 1.30% 1.35% 1.47% 1.71%
18% 19% 20% 21% 22% 23% 24% 25%
EEP 0.00% 0.00% 0.01% 0.01% 0.76% 7.62% 25.26% 48.55%
UEP 0.00% 0.00% 0.00% 0.01% 0.12% 1.50% 10.95% 30.84%
28% 29% 30% 31% 32% 33% 34% 35%
EEP 0.01% 0.01% 0.01% 0.03% 0.58% 4.51% 18.47% 37.94%
UEP 0.01% 0.01% 0.01% 0.01% 0.02% 0.71% 5.65% 20.10%
18% 19% 20% 21% 22% 23% 24% 25%
EEP 4.14% 4.14% 4.14% 4.18% 5.56% 19.01% 51.48% 81.23%
UEP 3.36% 3.36% 3.38% 3.54% 5.85% 13.24% 31.30% 60.98%
28% 29% 30% 31% 32% 33% 34% 35%
EEP 4.14% 4.14% 4.14% 4.16% 5.34% 13.70% 39.12% 71.19%
UEP 4.14% 4.14% 4.14% 4.15% 4.21% 5.57% 15.32% 42.00%
30%
MAXAD
Channel Error
Rate
20%
Channel Error
Rate
20%
Channel Error
Rate
30%
Swiss
Landscape
30%
Mars
Landscape
MAXAD
Channel Error
Rate
Channel Error
Rate
Channel Error
Rate
Feline MAXAD
20%

116 Chapter 4

0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
18% 19% 20% 21% 22% 23% 24% 25%
Channel Error Rate
M
A
X
A
D
UEP
EEP
(a)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
17% 18% 19% 20% 21% 22% 23% 24% 25%
Channel Error Rate
M
A
X
A
D
UEP
EEP
(b)
Figure 4-9 (part 1 of 2): EEP versus proposed UEP approach for channel
mismatches; both EEP and UEP assume a 20% BEC, while the actual bit erasure
rate is in the range 17%-25%. The results are for: (a) Mars Surface, and (b) Swiss
Landscape models respectively.

1.00%
1.10%
1.20%
1.30%
1.40%
1.50%
1.60%
1.70%
1.80%
1.90%
2.00%
17% 18% 19% 20% 21% 22% 23% 24% 25%
Channel Error Rate
M
A
X
A
D
UEP
EEP
(c)
rate is in the range 17%-25%. The results are for (c) Feline model.

0%
5%
10%
15%
20%
25%
30%
35%
40%
27% 28% 29% 30% 31% 32% 33% 34% 35%
Channel Error Rate
M
A
X
A
D
UEP
EEP
(a)
rate is in the range 27%-35%. The results are for: (a) Mars Surface, and (b) Swiss
Landscape models respectively.
118 Chapter 4

0%
10%
20%
30%
40%
50%
60%
70%
80%
27% 28% 29% 30% 31% 32% 33% 34% 35%
Channel Error Rate
M
A
X
A
D
UEP
EEP
(b)
1.20%
1.30%
1.40%
1.50%
1.60%
1.70%
1.80%
1.90%
27% 28% 29% 30% 31% 32% 33% 34% 35%
Channel Error Rate
M
A
X
A
D
UEP
EEP
(c)
rate is in the range 27%-35%. The results are for (c) Feline mode.

The MAXAD versus the actual bit erasure rate for both UEP and EEP approaches
operating under the same channel conditions are depicted in Figure 4-9 for 20%
BEC, and in Figure 4-10 for 30% BEC. Additionally, a visual comparison
illustrating the differences between the original and decoded vertex positions for the
EEP and the proposed UEP approach is given in Figure 4-11 for the Swiss

Landscape model. In this figure both EEP and UEP assume a 20% BEC, while the
actual bit erasure rate is 23%.

(a)

(b)
Figure 4-11: Differences (in %) between the original and decoded vertex positions
for the Swiss Landscape mesh: (a) EEP versus the proposed (b) UEP approach; both
assume a 20% BEC, while the actual error rate is 23%.
These results show that when the actual channel error rate matches the assumed
rate, the differences between the two approaches are negligible. However, the UEP
approach is capable of providing a better resilience against errors, in particular for
120 Chapter 4

large channel mismatches. The results in Figure 4-11 show that the differences
between the original and decoded vertex positions are more significant (in number
and amplitude) for EEP versus UEP. This confirms that in joint source and channel
coding, UEP should be favored over EEP.
4.5 DEMONSTRATION OF SCALABLE CODING AND
TRANSMISSION FOR MESHGRID
One domain where the proposed JSCC approach demonstrates its benefits is
scalable coding and transmission of meshes over wireless channels. In such settings,
wireless communication towards mobile terminals (e.g. PDAs) with very limited
graphics and processing power requires resilience against high error-rates, fast
optimization of the rate-allocation and real-time execution. All these constraints are
met by the proposed JSCC approach. In order to demonstrate this concept, we have
actually implemented such an application (see Figure 4-12), exploiting the streaming
capabilities of MESHGRID (see section 2.3.3).
The system performs scalable transmission over a wireless (UDP) channel of
MESHGRID encoded objects from a base station towards a mobile terminal. The
server application, running on the base station, is a content provider of 3D scenes
represented in the MESHGRID format. The server establishes connections with the
client application, deployed on the PDA, and performs the streaming of the 3D
content towards the client. The connection between the two is bi-directional: there is
a downchannel from the server to the client used to send the MESHGRID bitstream,
and a backchannel in the opposite direction used by the client to send requests
(codec settings, required resolution and quality levels, required regions-of-interest,
etc.) to the server. Once a connection is established, the server progressively streams
the requested scene, described in XML, containing the compressed 3D model to the
client. A snapshot of the system running both the server and client applications has
been taken and depicted in Figure 4-13. The error protection at the server side, the
transmission and the rendering on the client-side are all performed in real-time. This
highlights the practical application of scalable MESHGRID coding and transmission
over wireless channels towards terminals with limited graphics and computational
capabilities.

Figure 4-12: Client-Server scenario for interactive display of MESHGRID objects: (a)
direct connection; (b) indirect connection; (c) the newly accepted connection is
added to the clients list; (d) the client requests the desired MESHGRID stream
components; (e) the server sends only the requested parts from the bitstream.

Figure 4-13: System performing an error-resilient scalable transmission of
MESHGRID encoded objects from a base station (the laptop running both the server
and the client applications) towards a mobile terminal (the PDA) over a wireless
(UDP) channel.
Client Browser Client Browser
Display Display
XML Scene Parser
Display
XML Camera Parser
MESHGRID Decoder

XML Scene Info
Server
XML Camera Info
XML Texture Info
XML Track Info
XML Mesh Info

MESHGRID Stream
XML Track Parser
XML MG Parser
Listening Socket
Socket (Thread)
SOCKS5
Proxy
(a)
(b)
Sockets Pool (Threads)
(c)
(d)
(e)
Client
122 Chapter 4

4.6 CONCLUSIONS
The chapter proposes a novel approach for scalable joint source and channel
coding of meshes. An unequal error protection approach is followed, to deal with the
different error-sensitivity levels characterizing the various resolution and quality
layers produced by the scalable source codec. A JSCC problem is solved, wherein
the estimated distortion is minimized subject to a total rate constraint. The number
of layers for each source and the code rates for each layer are simultaneously
determined subject to a total bit budget. In this context, we propose a novel fast
algorithm for solving the constrained-optimization problem, whose complexity is
much lower than that of state-of-the-art. The proposed JSCC algorithm is applicable
to any scalable mesh codec and is illustrated for the specific case of MESHGRID.
Furthermore, in contrast to other JSCC methods existing in the literature, in our
approach the JSCC problem is formulated and solved for both the L-infinite and the
classical L-2 distortion metrics. Optimizing the rate allocation subject to an L-
infinite (i.e. MAXAD) bound is to our knowledge a unique feature in mesh coding.
In terms of performance, numerical results show that similar to the error-free case
the L-infinite norm is a better option than the L-2 norm in an error-prone setting,
particularly in low-rate coding of meshes.
The experiments demonstrate that UEP provides superior results compared to
EEP, especially in case of channel mismatches. This result could be anticipated: due
to the fact that UEP better protects the more important parts of the bitstream and
provides less protection to the others, the important data can be recovered even if the
amount of errors is larger than predicted. In addition, the proposed unequal error
protection approach proved to surpass the state-of-the-art scheme in terms of both
distortion and execution time, which clearly favor the new JSCC approach.
It is important to observe also that, since the proposed JSCC approach employs
FECs on a per-packet basis, it allows for preserving the original scalability features
and animation capabilities of the employed scalable source codec. In the context of
MESHGRID, this is of key importance, since MESHGRID is an MPEG-4 AFX
standard. We show also that the proposed JSCC rate-allocation algorithm allows for
real-time execution, which is, to our knowledge, unique in the context of error-
resilient coding of meshes. This is also particularly important in the context of
MPEG-4, from the perspective of developing an error-resilient coding profile for
MESHGRID. We conclude that the proposed JSCC approach offers resilience against
transmission errors, provides graceful degradation, enables real-time
implementations, and preserves all the scalability features and animation capabilities
of the employed source codec.

Chapter 5
CODING OF DYNAMIC MESHES
BASED ON MESHGRID
5.1 INTRODUCTION
Dynamic meshes can be used to reproduce the motion of real life objects, the
animation of cartoon-alike objects, the dynamics of simulation data, or any other
types of dynamic models. Compared to static models, the data rates required by
dynamic models are significantly higher, posing significant demands both in terms
of storage, as well as in transmission scenarios, in particular when performed over
channels with limited bandwidth. Therefore, in many applications it would be useful
to provide the means allowing for an efficient encoding of dynamic meshes and
enabling 3D video rendering with free viewpoint reconstruction.
In this chapter, we evaluate the coding performance of MESHGRID when used to
encode a time-varying sequence of a 3D mesh. In this context, the concept of L-
infinite mesh coding is extrapolated from static models to dynamic models. The
considered scenario is simple, that is, the mesh connectivity is assumed to remain
the same in the entire sequence and only the vertex coordinates change in time. The
obtained bitstream, which encodes all the frames from the time sequence, follows
entirely the standardized specifications of the MESHGRID bitstream.
5.2 DYNAMIC-MESH CODING APPROACH
The L-infinite coding approaches introduced in Chapter 3 are not solely limited to
static models, but they can be extended to dynamic models as well. In our approach,
the full model, i.e. the mesh connectivity and the reference-grid, are encoded once
for the first frame, which acts as a reference model for the entire sequence. For the
following frames, the connectivity-wireframe remains unchanged, and only the
reference-grid coordinates are modified. Hence, the system needs to encode only
124 Chapter 5

once the connectivity wireframe. Also, the differences in the reference-grid
coordinates between successive frames are encoded in the same way as for the static
models.
The basic architecture of the proposed dynamic mesh coding system is shown in
Figure 5-1. Let ( ) t M represent a mesh-model at a certain time instance t , and
denote by ( ) , t v M an arbitrary vertex with coordinates v in this mesh. Also,
denote by M the target MAXAD for each frame in the dynamic sequence.
L Encoder
+
L Decoder
( ) 1 M ( ) 2 M
( ) 1 M
L Encoder
L Decoder
+
( ) 2 M
+
( ) 2 e
( ) 2 e
+
+
+
( ) 1 t M
L Encoder
L Decoder
+
( ) 1 t M
+
( ) 1 e t
( ) 1 e t
+
+
+
( ) t M
L Encoder
L Decoder
+
( ) t M
M
+
( ) e t
( ) e t
+
+
M M
M
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .

Figure 5-1: Basic architecture of the proposed MESHGRID-based coding system for
dynamic sequences.
In a first step, the encoding system performs an L-infinite encoding of the first
frame, ( ) 1 M , using the target MAXAD M , immediately followed by a decoding
of this frame. We note that the entropy encoding/decoding modules are deactivated
in order to speed up this process. As shown in section 3.7, the difference in vertex
positions between the original mesh ( ) 1 M and the reconstructed mesh ( ) 1 M is
upper-bounded by the MAXAD, that is:
Coding of Dynamic Meshes 125

( ) ( ) ( ) 1, 1, , 1 M s e v v v M M M . (5.1)
In a second coding step, corresponding to 2 t = , the proposed dynamic-mesh
coding system uses the reconstructed frame ( ) 1 M as a predictor for the current
frame ( ) 2 M (see Figure 5-1). The reference-grid coordinate difference between the
two (or error-frame), namely ( ) ( ) ( ) 2 2 1 e =M M , is encoded in the same way as
for the static models using the L-infinite encoder operating at the target MAXAD M .
The subsequent decoding of the error-frame ( ) 2 e produces ( ) 2 e . Similar to
(5.1) we then have:
( ) ( ) ( ) 2, 2, , 2 e e M e s e v v v . (5.2)
In a third step, the reconstructed error-frame ( ) 2 e is added back to the prediction
( ) 1 M to produce the reconstructed frame ( ) ( ) ( ) 2 1 2 e = + M M , which is
subsequently used in the prediction of the third frame ( ) 3 M , and so on.
It can be shown that, with the proposed architecture, the maximum absolute
difference between the vertex positions in the original frame ( ) 2 M and the vertex
positions in the reconstructed frame ( ) 2 M is upper-bounded by the target MAXAD
M . Indeed:

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( )
2, 2, 1, 2, 1, 2,
2, 2, , 2
e e
e e M e
= + + =
= s e
v v v v v v
v v v
M M M M
(5.3)
Since there is a one-to-one mapping between reference-grid coordinates in ( ) 2 e
and vertices in ( ) 2 M , one concludes that:
( ) ( ) ( ) 2, 2, , 2 M s e v v v M M M (5.4)
The encoding process detailed above is repeated recursively at every time-
instance t (see Figure 5-1). Similar to above, it can be shown that (5.4) holds for
any t . Indeed:
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
, , 1, , 1, ,
, , , , , ,
t t t e t t e t
e t e t M e t t t M t
= + + =
= s e s e
v v v v v v
v v v v v v
M M M M
M M M
(5.5)
This shows that the proposed system performs an L-infinite constrained coding of
dynamic meshes, ensuring that the error in every frame in the sequence is bounded
by the target MAXAD.
As a final remark in this section, we note that the complexity of the encoder is
significantly higher to that of the decoder. This is typical for predictive coding
approaches, and is caused by the fact that both encoding and decoding need to be
performed at the encoder side. This is done in order to prevent the temporal
propagation of prediction errors and to ensure that, at any time instance, the encoder
and decoder stay perfectly synchronized.
126 Chapter 5

The following set of experiments aims to explore the coding performance of the
proposed system for L-infinite-constrained coding of dynamic sequences and to
illustrate the applicability of MESHGRID in this context.
A volumetric animation of the RG points gives the same effects as a direct
animation of vertices [Salomie 2005, Salomie 2004b], and this is due to the fact that
the vertices are attached to the RG and their coordinates are derived from the
coordinates of the RG points. The advantage of using a RG-based animation is that
the animation can be defined in a hierarchical manner [Salomie 2005, Salomie
2004b]. An example is given in Figure 5-2, illustrating three frames obtained by a
volumetric animation of the Humanoid model.

(a) (b) (c)

Figure 5-2: Volumetric animation of the Humanoid model obtained by altering the
positions of the RG points.
The Humanoid sequence is the first sequence used in our experiments. This
sequence consists of 152 frames, with 7646 vertices and 15196 triangles per frame,
each frame being represented as a single resolution 3D mesh.

Figure 5-3: Humanoid sequence encoded using MAXAD targets (from left to right)
of: 0, 0.01, 0.1, 0.5, 1 and 2 (%).
The animated Humanoid sequence has been encoded using several MAXAD
constraints. The rate-allocation algorithm estimates the bit-planes that need to be
encoded for the reference-frame and for each error-frame such that the imposed
MAXAD constraint is satisfied at any time-instance.
The frames at a certain time-instance in the decoded sequences are illustrated in
Figure 5-3 for different MAXAD targets. In addition, Figure 5-4 shows the rate-
distortion curve for this experiment. Finally, Figure 5-5 illustrates the RMS measure
(relative to the surrounding box) over the bit-rate, computed with the M.E.S.H. tool
[Aspert 2002].
These results indicate that the bit-rate for the entire sequence can be dropped from
900 kbits/s (corresponding to the lossless representation) to 100 kbits/s
(corresponding to a MAXAD of 1%) with hardly any visual penalty on the
reconstructed model (see Figure 5-3 and Figure 5-4). Also, the RMS decays
gracefully, leading to the conclusion that the system is characterized by a smooth
decay not only in L-infinite sense (Figure 5-4) but also in L-2 sense (Figure 5-5).
Figure 5-6 shows the reconstructed models and the error distribution for several bit-
rates, obtained by imposing different MAXAD values. These results have been
obtained with the M.E.S.H. tool [Aspert 2002], which has been used in order to
measure the Hausdorff distance between the lossy-compressed frames and the
original ones.

128 Chapter 5

0
0.5
1
1.5
2
2.5
3
3.5
0 100 200 300 400 500 600 700 800 900 1000
Bit-rate [kbit/s]
M
A
X
A
D

Figure 5-4: Rate-distortion curve for the Humanoid sequence encoded at different
MAXAD values (%).

Bit-rate [kbit/s]
R
M
S

0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 100 200 300 400 500 600 700 800 900 1000

Figure 5-5: The RMS measure relative to the surrounding box over bit-rate,
computed with the M.E.S.H. tool for the Humanoid sequence.
These results indicate that the vertex errors are upper-bounded and that, for each
rate point, the vertex errors are mostly concentrated in the low- and mid-range (the
colors are mostly blue and green). All these experiments demonstrate that the
proposed L-infinite coding approach can be successfully extended towards
compression of dynamic models.


214 bytes/frame 454 bytes/frame

1.05 Kbytes/frame 1.76 Kbytes/frame
Figure 5-6 (part 1 of 2): Humanoid sequence: distribution on the surface (right) and
histogram of the coding errors (left) with respect to the non-compressed 3D frame
at different bitrates measured using the Hausdorff distance.

130 Chapter 5

2.43 Kbytes/frame 3.03 Kbytes/frame
Figure 5-6 (part 2 of 2): Humanoid sequence: distribution on the surface (right) and
histogram of the coding errors (left) with respect to the non-compressed 3D frame
at different bitrates measured using the Hausdorff distance.
The second group of tests aims to assess the feasibility of encoding morphing
sequences. 3D morphing is usually used for animations, i.e. to create some special
effects for the entertainment industry, but it can be also employed as a modeling tool
where some existing shapes are combined to obtain new shapes. Several morphing
techniques have been designed in the literature [Ahn 2002, Lee 1999], but they are
specific to the object representation employed to represent the models. As
mentioned in literature, several issues appear when morphing models with different
genus or topologies, i.e. topological similarities detection for the meshes to be
morphed [Lee 1999].
By construction, for MESHGRID, any deformation performed to the volume is
transferred to the surface and vice versa. Moreover, a MESHGRID object encodes the
information within a certain 3D space, which may contain several non-connected
entities. During morphing some of these entities may merge while others may split,
but each time-instance (3D frame) is derived from the same MESHGRID model (see
Figure 5-7). It is important to remark that, with MESHGRID, changes in the topology
of the mesh are allowed, i.e. the genus of the mesh can change; we remark that
allowing for topological changes is an important advantage of MESHGRID. Notice
that in example in Figure 5-7 each object consists of two surface layers; the outside
surface layer is transparent and soft (it deforms) and the inside layer is solid.


(a) (b)

(c) (d)

(e) (f)
Figure 5-7: An example of 3D morphing of molecules. Each image represents the
same MESHGRID model as it evolves in time. Changes in the topology of the mesh
are allowed, i.e. the genus of the mesh can change.
Encoding a morphing animation with MESHGRID is an ROI-based extension of
the system used in the first set of experiments. That is, the first 3D frame needs to be
encoded fully, as done for a static model, while for each subsequent morphed 3D
frame one needs to identify for each ROI what changes occurred (if any) with
respect to the previous frame and encode those differences.

132 Chapter 5

For each ROI one may choose the most compact way to encode these differences:
(i) encode the CW and choose a uniformly distributed RG for which only its corners
need to be encoded, (ii) update the vertex offsets, (iii) keep the same CW but update
the RG, or (iv) some combination of these.
We remark that for scenes requiring topological changes, one may need to encode
the CW at each time-instance. We also point out that one might consider at certain
time instance to encode the 3D frames fully as one would insert key frames in a
2D video sequence and not as an error-frame. This can be done in situations when
the past frame is not a sufficiently good prediction of the current frame.

(a) (b) (c) (d)
Figure 5-8: Frames (a) to (d) from a morphing sequence simulating: (top) MeltPlast,
the dynamic sequence of melting plastic objects (two consecutive images are at 10
frames distance); (bottom) Blobs, five bouncing blobs (two consecutive images are
at 5 frames distance).
Figure 5-8 illustrates the topological changes in the MeltPlast and Blobs
morphing sequences. These sequences are generated by applying transformations in
the time domain to composite implicit surface descriptions. The 3D frames are
obtained from these composite implicit surface descriptions by using TRISCAN
[Salomie 2001, Salomie 2005, Salomie 2004b]. The MeltPlast sequence (Figure 5-8
top) is composed of 249 frames, with 16000 triangles and 32000 vertices in average,
while the Blobs sequence (Figure 5-8 bottom) consists of 11136 triangles and 5574
vertices in average.


5bpv (10.3 Kbytes/frame) 7bpv (14.5 Kbytes/frame) 9bpv (18.6 Kbytes/frame)
Figure 5-9: Coding efficiency at 5, 7 and 9 bpv for the MeltPlast sequence: (top)
decoded 3D frames, (bottom) the distribution on the surface and the histogram of
the errors with respect to the non-compressed 3D frame (Hausdorff distance).
For each intermediate 3D frame obtained during the metamorphosis, the CW is
re-encoded due to dramatic changes in the topology or genus of the mesh (both the
connectivity between the vertices and their number are altered). However, the RG is
kept unchanged for the entire sequence. Even when the CW changes from one frame
to another, the transition is smooth and natural (Figure 5-8). Thus, to generate the
animation bitstream, the first frame is fully encoded, while for each following frame
only the CW is encoded and added to the bitstream.
To assess the coding efficiency, the MeltPlast sequence is encoded lossy at
different bitrates, and the decoded 3D frames are compared with the non-
compressed 3D frames by measuring the Hausdorff distance [Aspert 2002] between
the surfaces of the corresponding 3D frames. Since for this model the reference-grid
is distributed uniformly, the bit-rates are specified by imposing different values for
the number of bitplanes used to quantize the vertex offsets (bpo) [ISO/IEC 2004,
Salomie 2005, Salomie 2004b]. Note that the total number of bits per vertex (bpv) is
computed as 4 plus the number of bits per offset (bpo), where 4 is the number of bits
per vertex used to encode the connectivity (which is fixed). The quality
improvement resulting from increasing rate is illustrated in Figure 5-9 for three
different bitrates, i.e. 5, 7 and 9 bpv. The first row shows the decoded 3D frame,
134 Chapter 5

while the second row illustrates the histograms and spatial distribution of errors.
5.4 CONCLUSIONS
In this chapter, we demonstrate that the MESHGRID system, enhanced by the L-
infinite distortion measure proposed in Chapter 3, can be used to efficiently encode
3D dynamic models in a scalable and compact way. Since MESHGRID allows
subdividing the space into ROIs, memory efficient algorithms can be also
implemented. Moreover, the 3D sequences can be generated as scalable MPEG-4
streams [ISO/IEC 2004], which can be played back as a free-viewpoint interactive
3D animation.
We note that the approach presented in this chapter is rather exploratory and
demonstrative. Further coding performance improvements are easily achievable, for
instance, by borrowing and applying temporal prediction techniques from video
coding.

Chapter 6
CONCLUSIONS AND PROSPECTIVE
WORK

6.1 CONCLUSIONS
This dissertation introduces the novel concept of scalable L-infinite-oriented
coding of static and dynamic models. A thorough analysis of several design options
reveals that an intra-band wavelet-based coding approach should be followed in
order to provide fine-granular scalability in L-infinite sense. In this context, a novel
approach for scalable wavelet-based coding of meshes is proposed, which allows for
minimizing the rate subject to an L-infinite distortion constraint. Two L-infinite
distortion estimators are presented, expressing the L-infinite distortion in the spatial
domain as a function of quantization errors produced in the wavelet domain. Based
on these, the proposed L-infinite codec optimizes the rate allocation for which the L-
infinite distortion (and consequently the Hausdorff distance) is upper-bounded by a
user-defined bound, and guaranteed to be below that bound. This is an interesting
and unique feature in the context of 3D object coding.
The proposed approach preserves all the scalability features and animation
capabilities of the employed scalable mesh codec and allows for fast, real-time
implementations of the rate-allocation. These are particularly important in real-time
applications and in the context of MPEG-4 AFX. With respect to the latter, the
proposed approach allows for developing a scalable L-infinite coding extension of
the MESHGRID system, without changing the characteristics and/or the existing
syntax of this MPEG-4 standard.
Apart of these, a data-dependent L-2 estimator is also proposed, significantly
improving the coding performance at low rates of the original MPEG-4 AFX
MESHGRID coding system. Based on the experimental results, we conclude that a
data-dependent L-2 estimator is sufficient for applications for which geometry
accuracy in not critical. However, L-infinite coding is the only available option for
136 Chapter 6

applications for which preserving geometry accuracy is compulsory.
The second part of the thesis proposes a novel approach for scalable joint source
and channel coding of meshes. An unequal error protection approach is followed, to
deal with the different error-sensitivity levels characterizing the various resolution
and quality layers produced by the scalable source codec. A JSCC problem is
solved, wherein the estimated distortion is minimized subject to a total rate
constraint. The number of layers for each source and the code rates for each layer
are simultaneously determined subject to a total bit budget. In this context, we
propose a novel fast algorithm for solving the constrained-optimization problem,
whose complexity is lower than that of similar algorithms. The proposed JSCC
algorithm is applicable to any scalable mesh codec and is illustrated for the specific
case of MESHGRID.
Furthermore, in contrast to other JSCC methods existing in the literature, in our
approach the JSCC problem is formulated and solved for both the L-infinite and the
classical L-2 distortion metrics. Optimizing the rate allocation subject to an L-
infinite (i.e. MAXAD) bound is to our knowledge a unique feature in mesh coding.
It is shown that solving an L-infinite-constrained optimization problem is equivalent
to finding a rate allocation such that the Hausdorff distance at the decoded resolution
is upper-bounded. This is interesting from the perspective of finding an optimum
rate allocation such that the maximum error in the vertex positions is upper-
bounded. In terms of performance, numerical results show that similar to the error-
free case the L-infinite norm is a better option than the L-2 norm in an error-prone
setting, particularly in low-rate coding of meshes.
The experimental results demonstrate the benefits brought by error-resilient
coding of meshes. The unequal error protection approach proved to surpass both
EEP and NEP schemes. We note that, while UEP and EEP are undoubtedly superior
to NEP in an error-prone setting, the NEP scheme performs better if the channel is
error free, which is due to the redundant information added by any error-protection
scheme. The experiments demonstrate that UEP provides superior results compared
to EEP, especially in case of channel mismatches. This result could be anticipated:
due to the fact that UEP better protects the more important parts of the bitstream and
provides less protection to the others, the important data can be recovered even if the
amount of errors is larger than predicted.
It is important to observe also that, since the proposed JSCC approach employs
FECs on a per-packet basis, it allows for preserving the original scalability features
and animation capabilities of the employed scalable source codec. In the context of
MESHGRID, this is of key importance, since MESHGRID is an MPEG-4 AFX
standard. We show also that the proposed JSCC rate-allocation algorithm allows for
Conclusions 137

real-time execution, which is, to our knowledge, unique in the context of error-
resilient coding of meshes. This is also particularly important in the context of
MPEG-4, from the perspective of developing an error-resilient coding profile for
MESHGRID. We conclude that the proposed JSCC approach offers resilience against
transmission errors, provides graceful degradation, enables real-time
implementations, and preserves all the scalability features and animation capabilities
of the employed source codec.
6.2 PROSPECTIVE WORK
A potential continuation of the work presented in this dissertation could be for
example to practically extend it to other 3D graphics compression schemes and
researching ways of adapting and improving it in these new application scenarios. In
this context, a very interesting idea might be to investigate ways to broaden the L-
infinite distortion metric proposed in Chapter 3 to a novel and very promising
coding technique based on wavelet subdivision surfaces, which is currently being
developed at our department. This coding technique in particular can greatly benefit
from the new distortion metric since it involves remeshing and subdivision methods,
which can directly profit of the proposed mechanisms for accurate local control of
the generated errors.
Additionally, another area of research might include exploring watermarking
algorithms for meshes that take advantage of the proposed distortion metric and/or
the error-resilient coding technique. By definition, a watermarking scheme aims to
embed as much as possible supplementary information to the existing data, while
ensuring a minimum induced distortion. Therefore, maximizing the quantity of
inserted information subject to a bound on distortion can be achieved by extending
the ideas presented in Chapter 4. Such an approach would allow for modifying the
wavelet coefficients in each wavelet subband while guaranteeing a distortion bound
on each vertex by means of local error control.
Another important aspect to be investigated in the future is the relation between
the subjective quality assessment of mesh compression algorithms, and how this
relates to mathematically defined distortion metrics such as the L-2 and L-infinite
distortions. While for images, subjective quality metrics have been defined,
extending such works towards mesh-geometry compression remains to be
investigated.
Finally, further improving the efficiency of the coding scheme for dynamic
meshes proposed in Chapter 5 could be another challenging research topic.
Significant improvements are expected to be achieved by applying, for instance,
138 Chapter 6

temporal prediction techniques combined with wavelet-based encoding of the
prediction errors.

139
LIST OF PUBLICATIONS
ISI Journal Publications
1. A. Munteanu, D. C. Cernea, A. Alecu, J. Cornelis, P. Schelkens, Scalable L-
infinite coding of Meshes, to be published in IEEE Transactions on Visualization
and Computer Graphics, 2009. (SCI of 2008: 2.445).
2. D. C. Cernea, A. Munteanu, A. Alecu, J. Cornelis, P. Schelkens, Scalable Joint
Source and Channel Coding of Meshes, IEEE Transactions on Multimedia, vol.
10, no. 3, pp. 503-513, March 2008. (SCI of 2008: 2.288).
3. I. A. Salomie, R. Deklerck, D. C. Cernea, A. Markova, A. Munteanu, P. Schelkens,
and J. Cornelis, Special Effects: Efficient and Scalable Encoding of the 3D
Metamorphosis Animation with MeshGrid, Lecture Notes in Computer Science,
Springer Berlin, vol. 3767, pp. 84-95, 2005 (SCI of 2005: 0.402).
Conference Publications with Peer Review
4. D. C. Cernea, A. Munteanu, J. Cornelis, P. Schelkens, Statistical L-Infinite
Distortion Estimation In Scalable Coding of Meshes, IEEE Workshop on
Multimedia Signal Processing, MMSP 2008, Cairns, Australia, October 8-10, 2008.
5. D. C. Cernea, A. Munteanu, J. Cornelis, P. Schelkens, Scalable Coding and
Transmission of Meshes using MeshGrid, International Conference on Computer
Games, Animation, and Multimedia, CGAT 2008, Singapore, pp. 1-4, April 2008.
6. D. C. Cernea, Adrian Munteanu, Alin Alecu, Jan Cornelis and Peter Schelkens,
"Joint Source and Channel Coding of MESHGRID-represented Objects," Picture
Coding Symposium, PCS 2007, Lisbon, Portugal, pp. 1-4, 7-9 November 2007.
7. D. C. Cernea, A. Munteanu, M. Stoufs, A. Alecu, J. Cornelis, and P. Schelkens,
Unequal error protection of the reference grid for robust transmission of
MeshGrid-represented objects over error-prone channels, SPIE International
Symposium on Optics East 2006, Wavelet Applications in Industrial Processing IV,
vol. 6383, pp. 1-10 , Boston, MA, USA, October 2006.
8. A. Markova, R. Deklerck, D. C. Cernea, I. A. Salomie, A. Munteanu, and P.
Schelkens, Addressing view-dependent decoding with MeshGrid, Signal
Processing Symposium, SPS 2006, pp. 71-74, Antwerp, Belgium, March 2006.
9. D. C. Cernea, I. A. Salomie, A. Alecu, P. Schelkens, and A. Munteanu, Wavelet-
based scalable L-infinity-oriented coding of MPEG-4 MESHGRID surface
models, SPIE Optics East, Wavelet applications in industrial processing, Boston,
Massachusetts, USA, pp. 1-10, October 23-26, 2005.

140
10. A. Salomie, A. Munteanu, R. Deklerck, D. C. Cernea, P. Schelkens, and J. Cornelis,
"From Triscan surface extraction to MeshGrid surface representation from MPEG-
4," IASTED International Conference on Computer Graphics and Imaging, CGIM
2004, Kauai, Hawaii, USA, pp. 61-67, August 17-19, 2004.
MPEG Standardization Contributions
11. D. C. Cernea, A. Munteanu, M. Stoufs, A. Alecu, J. Cornelis, P. Schelkens, Error-
resilient profile for MeshGrid: robust encoding of the reference-grid, ISO/IEC
JTC1/SC29/WG11 (MPEG), Hangzhou, China, MPEG Report M13883, October
23 - 27, 2006.
12. D. C. Cernea, A. Markova, I. A. Salomie, A. Alecu, P. Schelkens, A. Munteanu, R.
Deklerck, "Updates to the AFXEncoder related to MeshGrid," ISO/IEC
JTC1/SC29/WG11 (MPEG), Nice, France, MPEG Report M12612, October 17-21,
2005.
13. I. A. Salomie, R. Deklerck, D. C. Cernea, A. Markova, A. Munteanu, P. Schelkens,
"Updates to MeshGrid," ISO/IEC JTC1/SC29/WG11 (MPEG), Poznan, Poland,
MPEG Report M12377, July 25-29, 2005.
14. A. Salomie, D. C. Cernea, A. Markova, J. Lievens, R. Deklerck, A. Munteanu, and
P. Schelkens, "Encoding of dynamic meshes with MeshGrid (Part 2)," ISO/IEC
JTC1/SC29/WG11 (MPEG), Busan, Korea MPEG Report M12061, April 18-22,
2005.
15. A. Salomie, D. C. Cernea, A. Munteanu, and P. Schelkens, "MeshGrid
implementation into AFX encoder: Donation to ISO," ISO/IEC JTC1/SC29/WG11
(MPEG), Hong Kong, China, MPEG Report M11725, January 17-21, 2005.

141
REFERENCES
[Ahn 2002] M. Ahn and S. Lee, "Mesh metamorphosis with topology
transformations," Proceedings. 10th Pacific Conference on Computer Graphics
and Applications, pp. 481 - 482, 2002.
[Al-Regib 2002] G. Al-Regib and Y. Altunbasak, "An Unequal Error Protection
Method for Packet Loss Resilient 3-D Mesh Transmission," Proceedings of
IEEE INFOCOM, New York City, NY, Vol. 2, pp. 743-752, June 2002.
[Al-Regib 2005a] G. Al-Regib, Y. Altunbasak, and R. M. Mersereau, "Bit
Allocation for Joint Source and Channel Coding of Progressively Compressed
3-D Models," IEEE Transactions on Circuits and Systems for Video
Technology, vol. 15, no. 2, February 2005.
[Al-Regib 2005b] G. Al-Regib, Y. Altunbasak, and J. Rossignac, "Error-resilient
transmission of 3D models," ACM Transactions on Graphics, vol. 24, no. 2, pp.
182-208, April 2005.
[Al-Regib 2005c] G. Al-Regib, Y. Altunbasak, and J. Rossignac, "An unequal error
protection method for progressively transmitted 3D models," IEEE
Transactions on Multimedia, vol. 7, no. 4, pp. 766-776, August 2005.
[Albanese 1996] A. Albanese, J. Blmer, J. Edmonds, M. Luby, and M. Sudhan,
"Priority Encoding Transmission," IEEE Transactions on Information Theory,
vol. 42, no. 6, pp. 1737-1744, November 1996.
[Alecu 2001] A. Alecu, A. Munteanu, P. Schelkens, J. Cornelis, and S. Dewitte,
"MAXAD Distortion Minimization for Wavelet Compression of Remote
Sensing Data," Proceedings of SPIE Mathematics of Data/Image Coding,
Compression and Encryption IV, with Applications, San Diego, California,
USA, Vol. 4475, pp. 149-160, July 29 - August 3, 2001.
[Alecu 2003a] A. Alecu, A. Munteanu, P. Schelkens, J. Cornelis, and S. Dewitte,
"On the Optimality of Embedded Deadzone Scalar-Quantizers for Wavelet-
based L-infinite-constrained Image Coding," Proceedings of Data Compression
Conference, DCC 2003, pp. 10, 2002.
[Alecu 2003b] A. Alecu, A. Munteanu, P. Schelkens, J. Cornelis, and S. Dewitte,
"Wavelet-based Fixed and Embedded L-infinite-constrained Image Coding,"
SPIE Journal of Electronic Imaging, vol. 12, no. 3, pp. 522-538, July 2003.
[Alecu 2004] A. Alecu, A. Munteanu, J. Cornelis, S. Dewitte, and P. Schelkens, "On
the Optimality of Embedded Deadzone Scalar-Quantizers for Wavelet-based L-
infinite-constrained Image Coding," IEEE Signal Processing Letters, vol. 11,
no. 3, pp. 367-371, March 2004.
[Alecu 2005] A. Alecu, "Wavelet-based Scalable L-infinity-oriented Coding,"
Electronics and Information Processing Department (ETRO), Vrije Universiteit
Brussel, Brussels, PhD Thesis, 2005.

142
[Alecu 2006] A. Alecu, A. Munteanu, J. Cornelis, and P. Schelkens, "Wavelet-based
Scalable L-infinity-oriented Compression," IEEE Transactions on Image
Processing, vol. 15, no. 9, pp. 2499-2512, September 2006.
[Alliez 2001] P. Alliez and M. Desbrun, "Progressive encoding for lossless
transmission of triangle meshes," Proceedings of SIGGRAPH 2001, pp. 198-
205.
[Alliez 2003] P. Alliez and C. Gotsman, "Recent advances in compression of 3-D
meshes," Proceedings of Symposium on Multiresolution in Geometric
Modeling, September 2003.
[Ansari 1998] R. Ansari, N. Memon, and E. Ceran, "Near-lossless Image
Compression Techniques," Journal of Electronic Imaging, vol. 7, no. 3, pp.
486-494, July 1998.
[Aspert 2002] N. Aspert, D. Santa-Cruz, and T. Ebrahimi, "MESH: Measuring error
between surfaces using the Hausdorff distance," Proceedings of IEEE
International Conference on Multimedia and Expo 2002 (ICME), pp. 705-708,
August 2002.
[Avcibas 2002] I. Avcibas, N. Memon, B. Sankur, and K. Sayood, "A progressive
Lossless/Near-Lossless image compression algorithm," IEEE Signal
Processing Letters, vol. 9, no. 10, pp. 312-314.
[Banister 2002] B. A. Banister, B. Belzer, and T. R. Fischer, "Robust image
transmission using JPEG2000 and turbo-codes," IEEE Signal Processing
Letters, vol. 9, pp. 117-119, April 2002.
[Benedens 1999] O. Benedens, "Geometry-based watermarking of 3-D models,"
IEEE Computer Graphics and Applications, vol. 19, no. 1, pp. 46 55, January
1999.
[Bors 2006] A. G. Bors, "Watermarking mesh-based representations of 3-D objects
using local moments," IEEE Transactions on Image Processing, vol. 15, no. 3,
pp. 687 701, March 2006.
[Cernea 2005] D. Cernea, I. A. Salomie, A. Alecu, P. Schelkens, and A. Munteanu,
"Wavelet-based scalable L-infinity-oriented coding of MPEG-4 MeshGrid
surface models," Proceedings of SPIE Optics East, Wavelet applications in
industrial processing, Boston, Massachusetts, USA, Vol. 6001, October 23-26,
2005.
[Cernea 2008a] D. C. Cernea, A. Munteanu, A. Alecu, J. Cornelis, and P. Schelkens,
"Scalable joint source and channel coding of meshes," IEEE Transactions on
Multimedia, vol. 10, no. 3, pp. 503-513, April 2008.
[Cernea 2008b] D. C. Cernea, A. Munteanu, J. Cornelis, and P. Schelkens,
"Statistical L-infinite Distortion Estimation in Scalable Coding of Meshes,"
Proceedings of Multimedia Signal Processing, Cairns, Australia, pp. 6,
09.10.2008.
[Chen 2005] Z. Chen, J. F. Barnes, and B. Bodenheimer, "Hybrid and forward error
correction transmission techniques for unreliable transport of 3D geometry,"
Multimedia Systems Journal, vol. 10, no. 3, pp. 230-244, March 2005.

143
[Chou 2002] P. H. Chou and T. H. Meng, "Vertex data compression through vector
quantization," IEEE Trans. Vis. Comput. Graph., vol. 8, no. 4, pp. 373382,
Apr. 2002.
[Cignoni 1998] P. Cignoni, C. Rocchini, and R. Scopigno, "METRO: measuring
error on simplified surfaces," Computer Graphics Forum, vol. 17, no. 2, pp.
167-174, June 1998.
[Daubechies 1998] I. Daubechies and W. Sweldens, "Factoring Wavelet Transforms
into Lifting Steps," Journal of Fourier Analysis and Applications, vol. 4, no. 3,
pp. 247-269, 1998.
[Gallager 1963] R. Gallager, "Low-Density Parity-Check Codes," Massachusetts
Institute of Technology, 1963.
[Gandoin 2002] P. M. Gandoin and O. Devillers, "Progressive lossless compression
of arbitrary simplicial complexes," ACM Transactions on Graphics, vol. 21, no.
3, pp. 372379.
[Garland 1997] M. Garland and P. Heckbert, "Surface simplification using quadric
error metrics," Proceedings of SIGGRAPH 1997, pp. 209-216.
[Gotsman 2002] C. Gotsman, S. Gumhold, and L. Kobbelt, "Simplication and
compression of 3-D meshes," Tutorials on multiresolution in geometric
modelling.
[Hoppe 1996] H. Hoppe, "Progressive meshes," Proceedings of SIGGRAPH 1996,
pp. 99108.
[Horn 1999] U. Horn, K. Stuhlmuller, M. Link, and B. Girod, "Robust internet video
transmission based on scalable coding and unequal error protection," Signal
Processing: Image Communication, vol. 15, pp. 77-94.
[Hsiang 2000] S.-T. Hsiang and J. W. Woods, "Embedded image coding using
zeroblocks of subband/wavelet coefficients and context modeling,"
Proceedings of IEEE International Symposium on Circuits and Systems
(ISCAS), Geneva, Switzerland, Vol. 3, pp. 662-665, May 28-31,2000.
[Hu 2005] X.-Y. Hu, E. Eleftheriou, and D.-M. Arnold, "Regular and irregular
progressive edge-growth Tanner graphs," IEEE Transactions on Information
Theory, vol. 51, no. 1, pp. 386-398, Jan. 2005.
[ISO/IEC 2004] ISO/IEC, "MPEG-4 AFX, Information technology Coding of
audio-visual objects Part 16: Animation Framework eXtension (AFX),"
ISO/IEC JTC1/SC29/WG11 (MPEG), 14496-16, Feb. 2004.
[Karni 2000] Z. Karni and C. Gotsman, "Spectral compression of mesh geometry,"
Proceedings of SIGGRAPH 2000, pp. 279286.
[Karray 1998] L. Karray, P. Duhamel, and O. Rioul, "Image Coding with an L-
infinite Norm and Confidence Interval Criteria," IEEE Transactions on Image
Processing, vol. 7, no. 5, pp. 621-631, May 1998.
[Khodakovsky 2000] A. Khodakovsky, P. Schrder, and W. Sweldens, "Progressive
geometry compression," Proceedings of SIGGRAPH 2000, pp. 271-278, 2000.

144
[Khodakovsky 2002] A. Khodakovsky and I. Guskov, "Normal mesh compression,"
Geometric Modelling for Scientific Visualization.
[Kompatsiaris 2001] I. Kompatsiaris, D. Tzovaras, and M. G. Strintzis,
"Hierarchical representation and coding of surfaces using 3D polygon meshes,"
IEEE Transactions on Image Processing, vol. 10, no. 8, August 2001.
[Kou 2001] Y. Kou, S. Lin, and M. P. C. Fossorier, "Low-density parity-check
codes based on finite geometries: a rediscovery and new results," IEEE
Transactions on Information Theory, vol. 47, no. 7, pp. 2711-2736, Nov. 2001.
[Kuhn 1951] H. W. Kuhn and A. W. Tucker, "Nonlinear programming,"
Proceedings of 2nd Berkeley Symposium, pp. 481-492, 1951.
[Lee 1999] A. Lee, D. Dobkin, W. Sweldens, and P. Schrder, "Multiresolution
Mesh Morphing," Proceedings of SIGGRAPH 99, pp. 343-350, August 1999.
[Li 2006] H. Li, M. Li, and B. Prabhakaran, "Middleware for streaming 3D
progressive meshes over lossy networks," ACM Transactions on Multimedia
Computing, Communications and Applications, vol. 2, no. 4, November 2006.
[Li 1998a] J. Li and C.-C. J. Kuo, "Progressive Coding of 3-D Graphic Models,"
Proceedings of the IEEE, vol. 86, no. 6, pp. 1052-1063, June 1998.
[Li 1998b] J. Li and C.-C. J. Kuo, "Compression ofmesh connectivity by dual graph
approach (M1)," in MPEG-4. Tokyo, 1998b.
[Lin 2004] S. Lin and D. J. Costello, Error Control Coding-Fundamentals and
Applications, second edition ed: Pearson,PrenticeHall, 2004.
[Linde 1980] Y. Linde, A. Buzo, and R. Gray, "An algorithm for vector quantizer
design," IEEE Transactions on Communications, vol. 28, no. 1, pp. 8495, Jan.
1980.
[Liu 2001] J. Liu and P. Moulin, "Information-Theoretic Analysis of Interscale and
Intrascale Dependencies between Image Wavelet Coefficients," IEEE
Transactions on Image Processing, vol. 10, no. 11, pp. 1647-1658, November
2001.
[Lounsbery 1997] M. Lounsbery, T. D. Derose, and J. Warren, "Multiresolution
analysis for surfaces of arbitrary topological type," ACM Transactions on
Graphics, vol. 16, no. 1, pp. 34-73, 1997.
[Luby 2001] M. Luby, M. Mitzenmacher, A. M. Shokrollahi, and D. A. Spielman,
"Improved low-density parity-check codes using irregular graphs," IEEE
Transactions on Information Theory, vol. 47, no. 2, pp. 585-598, Feb. 2001.
[Lucas 2000] R. Lucas, M. Fossorier, Y. Kou, and S. Lin, "Iterative decoding of
one-step majority-logic decodable codes based on belief propagation," IEEE
Transactions on Communications, vol. 48, no. 6, pp. 931-937, June 2000.
[MacKay 1999] D. MacKay, "Good error-correcting codes based on very sparse
matrices," IEEE Transactions on Information Theory, vol. 45, no. 2, pp. 399-
431, March 1999.
[Morn 2004] F. Morn and N. Garca, "Comparison of wavelet-based 3-D model

145
coding techniques," IEEE Transactions on Circuits and Systems for Video
Technology, vol. 14, no. 7, pp. 937-949, July 2004.
[Munteanu 1999a] A. Munteanu, J. Cornelis, G. Van der Auwera, and P. Cristea,
"Wavelet-based lossless compression scheme with progressive transmission
capability," International Journal of Imaging Systems and Technology, Special
Issue on Image and Video Coding, J. Robinson and R. D. Dony, Eds., vol. 10,
no. 1, pp. 76-85, January 1999.
[Munteanu 1999b] A. Munteanu, J. Cornelis, G. Van der Auwera, and P. Cristea,
"Wavelet Image Compression - The Quadtree Coding Approach," IEEE
Transactions on Information Technology in Biomedicine, vol. 3, no. 3, pp. 176-
185, September 1999.
[Munteanu 2003] A. Munteanu, "Wavelet Image Coding and Multiscale Edge
Detection: Algorithms and Applications," Electronics and Information
Processing Department (ETRO), Vrije Universiteit Brussel, Brussels, PhD
Thesis, 2003.
[Pajarola 2000] R. Pajarola and J. Rossignac, "Compressed progressive meshes,"
IEEE Transactions on Visualization and Computer Graphics, vol. 6, no. 1-3,
pp. 7993, 2000.
[Papoulis 1987] A. Papoulis, Probability, Random Variables, and Stochastic
Processes. New York: McGraw-Hill, 1987.
[Park 2002] S.-B. Park, C.-S. Kim, and S.-U. Lee, "Progressive mesh compression
using cosine index predictor and 2-stage geometry predictor," Proceedings of
ICIP, pp. 233-236, September 2002.
[Park 2003] S.-B. Park, C.-S. Kim, and S.-U. Lee, "Error Resilient Coding of 3-D
Meshes," Proceedings of IEEE International Conference on Image Processing,
Barcelona, Spain, Vol. 1, pp. 773-776, September 14-17, 2003.
[Park 2006] S.-B. Park, C.-S. Kim, and S.-U. Lee, "Error Resilient 3-D Mesh
Compression," IEEE Transactions on Multimedia, vol. 8, no. 5, pp. 885-895,
October 2006.
[Payan 2006] F. Payan and M. Antonini, "Mean square error approximation for
wavelet-based semiregular mesh compression," IEEE Transactions on
Visualization and Computer Graphics, vol. 12, no. 4, pp. 649-657, July 2006.
[Pearlman 2004] W. A. Pearlman, A. Islam, N. Nagaraj, and A. Said, "Efficient,
low-complexity image oding with a set-partitioning embedded block coder,"
IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, pp.
1219-1235, November 2004.
[Peng 2005a] J. Peng, C.-S. Kim, and C.-C. J. Kuo, "Technologies for 3-D mesh
compression: A survey," Journal of Visual Communication and Image
Representation, vol. 16, pp. 688733.
[Peng 2005b] J. Peng and C.-C. J. Kuo, "Geometry-guided progressive lossless 3-D
mesh coding with octree decomposition," ACM Transactions on Graphics, vol.
24, no. 3, pp. 609-616.

146
[Pereira 2002] F. Pereira and T. Ebrahimi, "The MPEG-4 book," Prentice Hall.
[Preda 2003] M. Preda, I. A. Salomie, F. Preteux, and G. Lafruit, "Virtual character
definition and animation within the MPEG-4 standard," in 3-D Modeling and
Animation: Synthesis and Analysis Techniques for the Human Body, M.
Strintzis and N. Sarris, Eds. Hershey, PA, USA: Idea Group Inc., 2003.
[Richardson 2001] T. J. Richardson, A. M. Shokrollahi, and R. Urbanke, "Design of
capacity-approaching irregular low-density parity-check codes," IEEE
Transactions on Information Theory, vol. 47, no. 2, pp. 619-637, Feb. 2001.
[Rissanen 1983] J. J. Rissanen, "A Universal Data Compression System," IEEE
Transactions on Information Theory, vol. 29, pp. 656-664.
[Rissanen 1984] J. J. Rissanen, "Universal Coding, Information, Prediction and
Estimation," IEEE Transactions on Information Theory, vol. 30, pp. 629-636,
1984.
[Rizzo 1997] L. Rizzo, "Effective erasure codes for reliable computer
communication protocols," ACM SIGCOMM Comput. Commun. Rev., vol. 27,
no. 2, pp. 2436, 1997.
[Rossignac 1999] J. Rossignac, "Edgebreaker: Connectivity compression for triangle
meshes," IEEE Trans. Visual. Comput Graphics, vol. 5, pp. 4761, Jan.Mar.
1999.
[Ryan 2003] W. E. Ryan, "An Introduction to LDPC Codes," 2003, pp. 1-23.
[Said 1996a] A. Said and W. Pearlman, "A New Fast and Efficient Image Codec
Based on Set Partitioning in Hierarchical Trees," IEEE Transactions on
Circuits and Systems for Video Technology, vol. 6, no. 3, pp. 243-250, June
1996.
[Said 1996b] A. Said and W. Pearlman, "An image multiresolution representation
for lossless and lossy compression," IEEE Transactions on Image Processing,
vol. 5, pp. 1303-1310, September 1996.
[Salomie 2001] A. Salomie, R. Deklerck, and J. Cornelis, "System and method to
obtain surface structures of multi-dimensional objects, and to represent those
surface structures for animation, transmission and display,"Patent application
EP 02075006.3, 2001.
[Salomie 2004a] I. A. Salomie, A. Munteanu, A. Gavrilescu, G. Lafruit, P.
Schelkens, R. Deklerck, and J. Cornelis, "MESHGRID A Compact, Multi-
Scalable and Animation-Friendly Surface Representation," IEEE Transactions
on Circuits and Systems for Video Technology, special issue on MPEG-4/AFX,
Editors M. Bourges-Svenier, E. S. Jang, G. Lafruit, and F. Morn, vol. 14, no.
7, pp. 950-966, July 2004.
[Salomie 2004b] I. A. Salomie, A. Munteanu, A. Gavrilescu, G. Lafruit, P.
Schelkens, R. Deklerck, and J. Cornelis, "MeshGrid A Compact, Multi-
Scalable and Animation-Friendly Surface Representation," IEEE Transactions
on Circuits and Systems for Video Technology, vol. 14, no. 7, pp. 950-966, July
2004.

147
[Salomie 2005] I. A. Salomie, "Extraction, hierarchical representation and flexible
compression of surface meshes derived from 3-D data," Vrije Universiteit
Brussel, 2005.
[Satti 2009] S. Satti, L. Denis, A. Munteanu, J. Cornelis, and P. Schelkens,
"Estimation of interband and intraband statistical dependencies in wavelet-
based decomposition of meshes," Wavelet Applications in Industrial
Processing, 18-22 January 2009.
[Schelkens 2003] P. Schelkens, A. Munteanu, J. Barbarien, M. Galca, X. Giro-Nieto,
and J. Cornelis, "Wavelet Coding of Volumetric Medical Datasets," IEEE
Transactions on Medical Imaging, Special issue on "Wavelets in Medical
Imaging," Editors M. Unser, A. Aldroubi, and A. Laine, vol. 22, no. 3, pp. 441-
458, March 2003.
[Shapiro 1993] J. M. Shapiro, "Embedded Image Coding Using Zerotrees of
Wavelet Coefficients," IEEE Transactions on Signal Processing, vol. 41, no.
12, pp. 3445-3462, 1993.
[Stoufs 2008] M. R. Stoufs, "Scalable Joint Source-Channel Coding of Image and
Video Signals," Vrije Universiteit Brussel, Dec. 2008.
[Sweldens 1996] W. Sweldens, "The Lifting Scheme: a Custom Design
Construction of Biorthogonal Wavelets," Journal of Appl. and Comput.
Harmonic Analysis, vol. 3, no. 2, pp. 186-200, 1996.
[Sweldens 1998] W. Sweldens, "The lifting scheme: A construction of second
generation wavelets," SIAM J. Math. Analysis, vol. 29, no. 2, pp. 511-546,
1998.
[Tanner 1981] R. M. Tanner, "A recursive approach to low complexity codes," IEEE
Transactions on Information Theory, vol. 27, no. 5, pp. 533-547, Sept. 1981.
[Taubin 1998a] G. Taubin, A. Guziec, W. Horn, and F. Lazarus, "Progressive
forest-split compression," Proceedings of SIGGRAPH 1998, pp. 123-132, 1998.
[Taubin 1998b] G. Taubin and J. Rossignac, "Geometric compression through
topological surgery," ACM Transactions on Graphics, vol. 17, no. 2, pp. 84-
115, 1998.
[Taubman 2002] D. Taubman and M. W. Marcelin, JPEG2000: Image Compression
Fundamentals, Standards, and Practice. Norwell, Massachusetts: Kluwer
Academic Publishers, 2002.
[Tian 2007a] D. Tian and G. Al-Regib, "Multistreaming of 3-D Scenes With
Optimized Transmission and Rendering Scalability," IEEE Transactions on
Multimedia, vol. 9, no. 4, pp. 736-745, June 2007.
[Tian 2007b] D. Tian, J. Li, and G. Al-Regib, "Joint source and channel coding for
3-D scene databases using vector quantization and embedded parity objects,"
IEEE Transactions on Image Processing, vol. 16, no. 6, June 2007.
[Touma 1998] C. Touma and C. Gotsman, "Triangle mesh compression,"
Proceedings of Graphics Interface Conf., Vancouver, Canada, Jun. 1998.
[Verdicchio 2006] F. Verdicchio, A. Munteanu, A. I. Gavrilescu, J. Cornelis, and P.

148
Schelkens, "Embedded Multiple Description Coding of Video," IEEE
Transactions on Image Processing, vol. 15, no. 10, pp. 3114 3130, Oct. 2006.
[Walsh 2002] A. E. Walsh and M. Bourges-Sevenier, "MPEG-4 Jump-Start,"
Prentice Hall.
[Wu 1996] X. Wu, N. Memon, and K. Sayood, "A Context-based, Adaptive,
Lossless/Near-Lossless Coding Scheme for Continuous-tone Images," ISO/IEC
SC29/WG 1/N256, Epernay, France
[Wu 1997a] X. Wu, W. K. Choi, and P. Bao, "L-infinity-constrained High-fidelity
Image Compression via Adaptive Context Modeling," Proceedings of DCC, pp.
91-100.
[Wu 1997b] X. Wu and N. Memon, "Context-based, Adaptive, Lossless Image
Coding," IEEE Transactions on Communications, vol. 45, no. 4, pp. 437-444.
[Wu 2000] X. Wu and P. Bao, "L-infinity-constrained High-fidelity Image
Compression via Adaptive Context Modeling," IEEE Transactions on Image
Processing, vol. 9, no. 4, pp. 536-542, April 2000.
[Yan 2001] Z. Yan, S. Kumar, and C. C. Kuo, "Error-Resilient Coding of 3-D
Graphic Models via Adaptive Mesh Segmentation," IEEE Transactions on
Circuits and Systems for Video Technology, vol. 11, no. 7, pp. 860-873, July
2001.
[Yan 2005] Z. Yan, S. Kumar, and C. C. Kuo, "Mesh Segmentation Schemes for
Error Resilient Coding of 3-D Graphic Models," IEEE Transactions on
Circuits and Systems for Video Technology, vol. 15, no. 1, pp. 138-144,
January 2005.
[Zhidong 2001] Y. Zhidong, S. Kumar, and C.-C. J. Kuo, "Error-resilient coding of
3-D graphic models via adaptive mesh segmentation," IEEE Transactions on
Circuits and Systems for Video Technology, vol. 11, no. 7, pp. 860-873, July
2001.

149
ACRONYMS
1D One Dimensional
2D Two Dimensional
3D Three Dimensional
3DMC 3D Mesh Coding
AFX Animation Framework eXtension
ARQ Automatic Repeat reQuest
AVC Advanced Video Coding
BAWGN Binary Additive White Gaussian Noise
BCH Bose-Chaudhuri-Hocquenghem
BEC Binary Erasure Channel
BER Bit Error Rate
BMS Binary Memoryless Symmetric
BPV Bits Per Vertex
BPS Bits Per Second
BSC Binary Symmetric Channel
CPM Compressed Progressive Mesh
CPU Central Processing Unit
CVS Coded Video Sequence
CWT Continuous Wavelet Transform
DCT Discrete Cosine Transform
DWT Discrete Wavelet Transform
EC Entropy Coding
EEP Equal Error Protection
EGPRS Enhanced General Packet Radio Service
EPV Erasure Protection Vector
EQ Embedded Quantization
EZW Embedded Zerotree Wavelet
FEC Forward Error Correction
FWT Fast Wavelet Transform
GPRS General Packet Radio Service
HDTV High-Definition TV
IDWT Inverse Discrete Wavelet Transform
IEC International Electrotechnical Commission

150
ISO International Organization for Standardization
JPEG Joint Photographic Experts Group
JSCC Joint Source and Channel Coding
JSVM Joint Scalable Video Model
JVT Joint Video Team
LDPC Low-Density Parity-Check
LOD Level Of Detail
MAXAD MAXimum Absolute Difference
MDC Multiple Description Coding
ME Motion Estimation
MPEG Moving Picture Experts Group
MRA Multi-Resolution Analysis
MSE Mean Squared Error
MV Motion Vector
NAL Network Abstraction Layer
NEP No Error Protection
PEG Progressive Edge Growth
PET Priority Encoding Transmission
PFS Progressive Forest Split
PSNR Peak Signal-to-Noise Ratio
QT-L QuadTree-Limited
RD Rate-Distortion
ROI Region-Of-Interest
RS Reed-Solomon
RTP Real-time Transport Protocol
RV Random Variable
SAD Sum of Absolute Differences
SAQ Successive Approximation Quantization
SDC Single Description Coding
SPECK Set Partitioning Embedded Block Coding
SPIHT Set Partitioning in Hierarchical Trees
SQP SQuare Partitioning
SVC Scalable Video Coding
TS Topological Surgery
UEP Unequal Error Protection
VD Valence-Driven Conquest
WSS Wavelet Subdivision Surfaces

Scalable Coding of Meshes

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Scalable Coding of Meshes

Încărcat de

Drepturi de autor:

Formate disponibile

FACULTY OF ENGINEERING

Department of Electronics and Informatics

x y . Another common distortion

x y is the L-2 norm.

= x y . The L-infinite distortion measure is given by:

A s must be satisfied for every possible combination ( ) , u v , yielding a system

, in which every source sample X is

S-ar putea să vă placă și