Sunteți pe pagina 1din 17

SIP (2020), vol. 9, e13, page 1 of 17 © The Author(s), 2020.

Published by Cambridge University Press in association with Asia Pacific Signal and Information Processing Association. This is an Open Access article, distributed under the
terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any
medium, provided the original work is properly cited.
doi:10.1017/ATSIP.2020.12

industrial technology advances

An overview of ongoing point cloud


compression standardization activities:
video-based (V-PCC) and geometry-based
(G-PCC)
d. graziosi,1 o. nakagami,2 s. kuma,2 a. zaghetto,1 t. suzuki2 and a. tabatabai1

This article presents an overview of the recent standardization activities for point cloud compression (PCC). A point cloud is a 3D
data representation used in diverse applications associated with immersive media including virtual/augmented reality, immer-
sive telepresence, autonomous driving and cultural heritage archival. The international standard body for media compression,
also known as the Motion Picture Experts Group (MPEG), is planning to release in 2020 two PCC standard specifications:
video-based PCC (V-CC) and geometry-based PCC (G-PCC). V-PCC and G-PCC will be part of the ISO/IEC 23090 series on
the coded representation of immersive media content. In this paper, we provide a detailed description of both codec algorithms
and their coding performances. Moreover, we will also discuss certain unique aspects of point cloud compression.

Keywords: Immersive media, MPEG-I, Point cloud compression, Video-based point cloud compression, V-PCC, ISO/IEC 23090-5,
Geometry-based point cloud compression, G-PCC, ISO/IEC 23090-9

Received 19 November 2019; Revised 9 March 2020

I. INTRODUCTION deployed in devices like televisions or smartphones. Stan-


dards are essential for the development of eco-systems for
Recent progress in visual capture technology has enabled data exchange. In 2013, MPEG first considered the use of
the capture and digitization of points corresponding to a point clouds for immersive telepresence applications and
3D world scene. Point clouds are one of the major 3D data conducted subsequent discussions on how to compress this
representations, which provide, in addition to spatial coor- type of data. With increased use cases of point clouds made
dinates, attributes (e.g. color or reflectance) associated with available and presented, a Call for Proposals (CfP) was
the points in a 3D world. Point clouds in their raw format issued in 2017 [4]. Based on the responses made to this
require a huge amount of memory for storage or band- CfP, two distinct compression technologies were selected for
width for transmission. For example, a typical dynamic the point cloud compression (PCC) standardization activi-
point cloud used for entertainment purposes usually con- ties: video-based PCC (V-PCC) and geometry-based PCC
tains about 1 million points per frame, which at 30 frames (G-PCC). The two standards are planned to be ISO/IEC
per second amounts to a total bandwidth of 3.6 Gbps if left 23090-5 and -9, respectively.
uncompressed. Furthermore, the emergence of higher res- The goal of this paper is to give an insight into the MPEG
olution point cloud capture technology imposes, in turn, PCC activity. We describe also the overall codec architec-
even a higher requirement on the size of point clouds. In ture, its elements and functionalities, as well as the coding
order to make point clouds usable, compression is neces- performance specific to various use cases. We also provide
sary. some examples that demonstrate the available flexibility on
The Moving Picture Expert Group (MPEG) has been the encoder side for higher coding performance and differ-
producing internationally successful compression stan- entiation using this new data format. The paper is presented
dards, such as MPEG-2 [1], AVC [2], and HEVC [3], as follows. Section II offers a generic definition for point
for over three decades, many of which have been widely cloud, along with use cases and a brief description of the
1
current standardization activity. Section III describes the
R & D Center US San Jose Laboratory, Sony Corporation of America, San Jose, USA
2
R & D Center, Sony Corporation, Shinagawa-ku, Japan video-based approach for PCC, also known as V-PCC, with
a discussion of encoder tools for PCC. In Section IV, there
Corresponding author:
D. Graziosi is a discussion of the geometry-based approach for PCC,
Email: danillo.graziosi@sony.com or G-PCC, with the examples of encoding tools. Section V
Downloaded from https://www.cambridge.org/core. IP address: 121.6.171.175, on 05 Mar 2021 at 13:02:09, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/ATSIP.2020.12 1
2 d. graziosi et al.

presents performance metrics for both approaches, and (using RGB cameras only) or a combination of passive and
Section VI concludes the article. active methods (using RGB and depth cameras) creates a
high-quality point cloud. Examples of volumetric capture
studios are 8i [11], Intel Studios [12], and Sony Innovation
II. POINT CLOUD Studios [13].
Different applications can use the point cloud informa-
In this section, we define common terms for point clouds, tion to execute diverse tasks. In automotive applications,
present use cases, and discuss briefly the ongoing standard- the spatial information may be used to prevent accidents,
ization process and activities. while in the entertainment industry application, the 3D
coordinates provide an immersive experience to the user.
A) Definition, acquisition, and rendering For the entertainment use case, rendering systems based
on point cloud information are also emerging, and solu-
A point cloud is composed of a collection of points in a
tions based on point cloud editing and rendering are already
3D space. Each point in the 3D space is associated with
available (e.g. Nurulize [14] provides solutions for editing
a geometry position together with the associated attribute
and visualization of volumetric data).
information (e.g. color, reflectance, etc.). The 3D coordi-
nates of a point are usually represented by floating-point
values; they can however be quantized into integer values B) Use cases
according to a pre-defined space precision. The quantiza- There are accordingly many applications that use point
tion process creates a grid in 3D space, and all the points clouds as the preferred data capture format. Here we explain
residing within each sub-grid volume are mapped to the some typical use cases, also shown in Fig. 2.
sub-grid center coordinates, referred to henceforth as vox-
els. This process of converting the floating-point spatial 1) VR/AR
coordinates to grid-based coordinates representation is also Dynamic point cloud sequences, such as the ones presented
known as voxelization. Each voxel can be considered a 3D in [15], can provide the user with the capability to see
extension of pixels corresponding to the 2D image grid moving content from any viewpoint: a feature that is also
coordinates. Notice that the space precision may affect the referred to as 6 Degrees of Freedom (6DoF). Such con-
perceived quality of the point cloud, as shown in Fig. 1. tent is often used in virtual/augmented reality (VR/AR)
The spatial information can be obtained using two differ- applications. For example, in [19, 20], point cloud visual-
ent methods: passive or active. Passive methods use multiple ization applications using mobile devices were presented.
cameras and perform image matching and spatial triangula- Accordingly, by utilizing the available video decoder and
tion to infer the distance between the captured objects in 3D GPU resources present in a mobile phone, V-PCC encoded
space and the cameras. Several variations of stereo match- point clouds were decoded and reconstructed in real-time.
ing algorithms [5] and multiview structure [6] have been Subsequently, when combined with an AR framework (e.g.
proposed as passive methods for depth acquisition. Active ARCore, ARkit), the point cloud sequence can be overlaid
methods use light sources (e.g. infra-red or lasers) and back- on a real world through a mobile device.
scattered reflected lights to measure the distances between
the objects and the sensor. Examples of active depth sensors 2) Telecommunication
are Microsoft Kinect [7], Apple TrueDepth Camera [8], Intel Because of high compression efficiency, V-PCC enables the
Realsense [9], Sony DepthSense [10], and many others. transmission of a point cloud video over a band-limited net-
Both active and passive depth acquisition methods can work. It can thus be used for tele-presence applications [16].
be used in a complementary manner to improve the gen- For example, a user wearing a head mount display device
eration of point clouds. The latest trend in capture tech- will be able to interact with the virtual world remotely by
nology is volumetric studios, where either passive methods sending/receiving point clouds encoded with V-PCC.

Fig. 1. Original point cloud with floating-point coordinates (left) and voxelized point cloud with integer coordinates (right).

Downloaded from https://www.cambridge.org/core. IP address: 121.6.171.175, on 05 Mar 2021 at 13:02:09, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/ATSIP.2020.12
an overview of ongoing point cloud compression standardization activities 3

Fig. 2. Use case examples for point clouds, from left to right: VR/AR [15], Telepresence [16], autonomous vehicles [17], and world heritage [18].

3) Autonomous vehicle 2 years, further improvements of the test models have been
Autonomous driving vehicles use point clouds to collect achieved through ongoing technical contributions, and the
information about the surrounding environment to avoid PCC standard specifications should be finalized in 2020. In
collisions. Nowadays, to acquire 3D information, multiple the following sections, a more detailed technical description
visual sensors are mounted on the vehicles. LIDAR sensor of both V-PCC and G-PCC codec algorithms is provided.
is one such example: it captures the surrounding environ-
ment as a time-varying sparse point cloud sequence [17].
G-PCC can compress this sparse sequence and therefore III. VIDEO-BASED POINT CLOUD
help to improve the dataflow inside the vehicle with a light COMPRESSION
and efficient algorithm.
In the following sub-sections, we describe the coding prin-
4) World heritage ciple of V-PCC and explain some of its coding tools.
For a cultural heritage archive, an object is scanned with
a 3D sensor into a high-resolution static point cloud [18]. A) Projection-based coding principle
Many academic/research projects generate high-quality
point clouds of historical architecture or objects to preserve The 2D video compression is a successful technology that
them and create digital copies for a virtual world. Laser is readily available due to the widespread adoption of video
range scanner [21] or Structure from Motion (SfM) [22] coding standards. In order to take advantage of this technol-
techniques are employed in the content generation process. ogy, PCC methods may convert the point cloud data from
Additionally, G-PCC can be used to lossless compress the 3D to 2D, which is then coded by 2D video encoders. Dur-
generated point clouds, reducing the storage requirements ing MPEG’s CfP, several responses to the CfP applied 3D to
while preserving the accurate measurements. 2D projections [24]. The proposal that became the basis for
the V-PCC test model [25] generated 3D surface segments
by dividing the point cloud into a number of connected
C) MPEG standardization regions, called 3D patches. Then, each 3D patch projected
Immersive telepresence was the first use case of point clouds independently into a 2D patch. This approach helped to
that was discussed by MPEG back in 2013. The dynamic reduce projection issues, such as self-occlusions and hid-
nature of points, distributed irregularly in space, tends to den surfaces, and offered a viable solution for the point
generate large amounts of data and thus provided initial cloud conversion problem. Additionally, orthographic pro-
evidence of the need for compression. Due to an increased jections were used to avoid resampling issues and allow
interest from the industry on point clouds, MPEG collected lossless compression.
requirements [23] and started the PCC standardization with The projection of a 3D patch onto a 2D patch acts like
a CfP [4] in January 2017. Three main categories of point a virtual orthographic camera, capturing a specific part of
cloud content were identified: static surfaces (category 1), the point cloud. The point cloud projection process is anal-
dynamic surfaces (category 2), and dynamically-acquired ogous to having several virtual cameras registering parts of
LIDAR sequences (category 3). After the CfP was issued, the point cloud, and combining those camera images into
several companies responded to the call with different com- a mosaic, i.e. an image that contains the collection of pro-
pression technology proposals. At first, MPEG identified jected 2D patches. This process results in a collection of
three distinct technologies: LIDAR point cloud compres- metadata information associated with the projection of each
sion (L-PCC) for dynamically acquired data, surface PCC patch, or analogously the description of each virtual capture
(S-PCC) for static point cloud data, and V-PCC for dynamic camera. For each atlas, we also have up to three associated
content [24]. Due to similarities between S-PCC and L- images: (1) a binary image, called occupancy map, which
PCC, the two proposals were later merged together into signals whether a pixel corresponds to a valid 3D projected
and referred to as G-PCC. The first test models were devel- point; (2) a geometry image that contains the depth infor-
oped in October 2017, one for category 2 (TMC2) and mation, i.e. the distance between each point’s location with
another one for categories 1 and 3 (TMC13). Over the last the projection plane; (3) a number of attribute image(s),

Downloaded from https://www.cambridge.org/core. IP address: 121.6.171.175, on 05 Mar 2021 at 13:02:09, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/ATSIP.2020.12
4 d. graziosi et al.

Fig. 3. 3D Patch projection and respective occupancy map, geometry, and attribute 2D images, (a) 3D patch, (b) 3D Patch Occupancy Map, (c) 3D Patch Geometry
Image, (d) 3D Patch Texture Image.

such as the texture (color) of each point, that is, a color occupancy maps, geometry, texture images, and the com-
image containing the R, G, and B components of the color pression of occupancy map, geometry, texture, and patch
information, or a material attribute, which may be repre- information. Notice that TMC2 uses HEVC to encode the
sented by a monochrome image. For the V-PCC case, the generated 2D videos but using HEVC is not mandatory
geometry image in particular is similar to a depth map, since and any appropriate image/video codec could also be used.
it registers the distance between the 3D point and the pro- Nevertheless, the architecture and results presented in this
jection plane. Other types of geometry images, such as the paper use HEVC as the base encoder. The TMC2 encoder
one described in [26, 27], do not measure distances and use architecture is illustrated in Fig. 4. The next sub-sections
a different mapping algorithm to convert 3D information will provide details on each of the encoding blocks shown
into 2D images. Moreover, this mapping was used with 3D below.
meshes only and not point clouds.
Figure 3 provides an example of a 3D patch and its 1) Patch generation
respective projection images. Note that in Fig. 3, we have Here we describe how to generate 3D patches in TMC2.
illustrated only the texture of each projected point; other First the normal for each point is estimated [29]. Given the
attribute images are also possible. The projection-based six orthographic projection directions (±x, ± y, ± z), each
approach is well suited for dense sequences that, when point is then associated with the projection direction that
projected, generate image-like continuous and smooth sur- yields the largest dot product between the point’s normal,
faces. For sparse point clouds, projection-based coding and the corresponding projection direction is selected. The
might not be efficient, and other methods like G-PCC point classification is further refined according to the clas-
described later might be more appropriate. sification of neighboring points’ projection direction. Once
the points’ classification is finalized, points with the same
projection directions (category) are grouped together using
a connected components algorithm. As a result of this pro-
B) V-PCC codec architecture cess, each connected component is then referred to as a
This section discusses some of the main functional 3D patch. The 3D patch points are then projected orthog-
elements/tools of V-PCC codec architecture, mainly: patch onally to one of the six faces of the axis-aligned bounding
generation, packing, occupancy map, geometry, attribute(s), box based on their associated 3D patch projection direction.
atlas image generation, image padding, and video compres- Notice that since the projection surface lies on a bounding
sion. It should be noted that the functionalities provided box that is axis-aligned, two of the three coordinates of the
by each of the listed tools can be implemented in differ- point remains unchanged after projection, the third coordi-
ent ways and methods. Specific implementation of these nate of the point is registered as a distance function between
functionalities is non-normative and is not specified by the point’s location in 3D space and the projection surface
the V-PCC standards. It thus becomes an encoder choice, (Fig. 5).
allowing flexibility and differentiation. Since a 3D patch may have multiple points projected onto
During the standardization of V-PCC, a reference soft- the same pixel location, TMC2 uses several “maps” to store
ware (TMC2) has been developed, in order to measure the these overlapped points. In particular, let us assume that we
coding performance and to compare the technical mer- have a set of points H(u,v) being projected onto the same
its/advantages of proposed point cloud coding methods that (u,v) location. Accordingly, TMC2 can, for example, decide
are being considered by V-PCC. The TMC2 encoder applies to use two maps: a near map and a far map. The near map
several techniques to improve coding performance, includ- stores the point from H(u,v) with the lowest depth value
ing how to perform packing of patches, the creation of D0. The far map stores the point with the highest depth

Downloaded from https://www.cambridge.org/core. IP address: 121.6.171.175, on 05 Mar 2021 at 13:02:09, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/ATSIP.2020.12
an overview of ongoing point cloud compression standardization activities 5

Fig. 4. TMC2 (V-PCC reference test model) encoder diagram [28].

Downloaded from https://www.cambridge.org/core. IP address: 121.6.171.175, on 05 Mar 2021 at 13:02:09, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/ATSIP.2020.12
6 d. graziosi et al.

Fig. 5. Patch generation.

value within a user-defined interval (D0, D0 + D). The with depth values above a certain threshold are removed
user-defined interval size D represents the surface thickness from the 3D patch. Furthermore, TMC2 defines a range to
and can be set by the encoder and used to improve geometry represent the depth, and if a depth value is larger than the
coding and reconstruction. allowed range, it is taken out of the 3D patch. Notice that
TMC2 has the option to code the far map as either a the points removed from the 3D patch will be later analyzed
differential map from the near map in two separate video by the connected components algorithm and may be used
streams, or temporally interleave the near and far maps into to generate a new 3D patch. Further improvements in 3D
one single stream and code the absolute values [30]. In both patch generation include using color information for plane
cases, the interval size D can be used to improve the depth segmentation [35] and adaptively selecting the best plane for
reconstruction process, since the reconstructed values must depth projection [36].
lie within the predefined interval (D0, D0 + D). The second TMC2 has the option to limit the minimum number of
map can be dropped altogether, and only an interpolation points in a 3D patch as a user-defined parameter. It there-
scheme is used to generate the far map [31]. Furthermore, fore will not allow the generation of the 3D patch that may
both frames can be sub-sampled and spatially interleaved contain a lower number of points than the specified min-
in order to improve the reconstruction quality compared imum number of points. Still, if these rejected points are
to reconstruction with one single map only, and coding within a certain user-defined threshold from the encoded
efficiency compared to sending both maps [32]. point cloud, they may also not be included in any other
The far map can potentially carry the information of regular patch. For lossy coding, TMC2 might decide to
multiple elements mapped to the same location. In lossless simply ignore such points. On the other hand, for lossless
mode, TMC2 modifies the far map image so that instead coding, points that are not included in any patch may be
of coding a single point per sample, multiple points can be encoded by additional patches for lossless coding. These
coded using an enhanced delta-depth (EDD) code [33]. The additional patches, known as auxiliary patches, store the
EDD code is generated by setting the EDD code bits to one (x, y, z) coordinates directly in the geometry image D0 cre-
to indicate whether a position is occupied between the near ated for regular patches, for 2D video coding. Alternatively,
depth value D0 and the far depth value D1. This code is then auxiliary patches can be stored in a separate image and
added on top of the far map. In this way, multiple points can coded as an enhancement image.
be represented by just using two maps.
It is however still possible to have some points that might 2) Patch packing
be missing, and the usage of only six projection directions Patch packing refers to the placement of projected 2D
may limit the reconstruction quality of the point cloud. patches in a 2D image of size W × H. This is an iterative
In order to improve the reconstruction of arbitrarily ori- process: first, the patches are ordered by their sizes. Then,
ented surfaces, 12 new modes corresponding to cameras the location of each patch is determined by an exhaus-
positioned at 45-degree direction were added to the stan- tive search in raster scan order, and the first location that
dard [34]. Notice that the rotation of point cloud to 45 guarantees overlap-free insertion (i.e. all the T × T blocks
degrees can be modeled using integer operations only, thus occupied by the patch are not occupied by another patch
avoiding resampling issues caused by rotations. in the atlas image) is selected. To improve fitting chances,
To generate images suitable for video coding, TMC2 has eight different patch orientations (four rotations combined
added the option to perform filtering operations. For exam- with or without mirroring) are also allowed [37]. Blocks
ple, TMC2 defines a block size T × T (e.g. T = 16), within that contain pixels with valid depth values (as indicated
which depth values cannot have large variance, and points by the occupancy map) belonging to the area and covered

Downloaded from https://www.cambridge.org/core. IP address: 121.6.171.175, on 05 Mar 2021 at 13:02:09, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/ATSIP.2020.12
an overview of ongoing point cloud compression standardization activities 7

Fig. 6. Example of patch packing.

by the patch size (rounded to a value that is a multiple of patch locations may consider different aspects besides only
T) are considered as occupied blocks, and cannot be used compression performance. For example, in [40], the patch
by other patches, guaranteeing therefore that every T × T placement process was modified to allow low delay decod-
block is associated with only one unique patch. In the case ing, that is, the decoder does not need to wait to decode the
that there is no empty space available for the next patch, information of all the patches to extract the block-to-patch
the height H of the image is doubled, and the insertion of information from the projected 2D images and can imme-
this patch is evaluated again. After insertion of all patches, diately identify which pixels belong to which patches as they
the final height is trimmed to the minimum needed value. are being decoded.
Figure 6 shows an example of packed patches’ occupancy
map, geometry, and texture images.
To improve compression efficiency, patches with sim- 3) Geometry and occupancy maps
ilar content should be placed in similar positions across
In contrast to geometry images defined in [26, 27], which
time. For the generation of a temporally consistent packing,
store (x,y,z) values in three different channels, geometry
TMC2 finds matches between patches of different frames
images in V-PCC store the distance between the miss-
and tries to insert matched patches at a similar location.
ing coordinates of points’ 3D position and the projection
For the patch matching operation, TMC2 uses the inter-
surface in the 3D bounding box, using only the lumi-
section over union (IOU) to find the amount of overlap
nance channel of a video sequence. Since patches can have
between two projected patches [38], according to the fol-
arbitrary shapes, some pixels may stay empty after patch
lowing equation:
packing. To differentiate between the pixels in the geom-
∩(preRect[i], Rect[j]) etry video that are used for 3D reconstruction and the
Qi,j = , (1) unused pixels, TMC2 transmits an occupancy map asso-
∪(preRect[i], Rect[j])
ciated with each point cloud frame. The occupancy map
where preRect[i] is the 2D bounding box of the patch[i] has a user-defined precision of B × B blocks, where for
from previous frame projected in 2D space. Rect[j] is the lossless coding B = 1 and for lossy coding, usually B = 4
bounding box of the patch[j] from current frame projected is used, with visually acceptable quality, while reducing
in 2D space. ∩(preRect[i], Rect[j]) is the intersection area significantly the number of bits required to encode the
between preRect[i] and Rect[j] and ∪(preRect[i], Rect[j]) occupancy map.
is the union area between preRect[i] and Rect[j]. The occupancy map is a binary image that is coded using
The maximum IOU value determines a potential match a lossless video encoder [41]. The value 1 indicates that
between patch[i] and patch[j] from the previous frame. If there is at least one valid pixel in the corresponding B × B
the IOU value is larger than a pre-defined threshold, the block in the geometry video, while a value of 0 indicates
two patches are considered to be matched, and information an empty area that was filled with pixels from the image
from the previous frame is used for packing and coding of padding procedure. For video coding, a binary image of
the current patch. For example, TMC2 uses the same patch dimensions (W/B, H/B) is packed into the luminance chan-
orientation and normal direction for matched patches, and nel and coded in lossless, but lossy coding of occupancy map
only differentially encodes the patch position and bounding is also possible [42]. Notice that for the lossy case, when
box information. the image resolution is reduced, the occupancy map image
The patch matching information can be used to place the needs to be up scaled to a specified nominal resolution. The
patches in a consistent manner across time. In [39], space in up-scaling process could lead to extra inaccuracies in the
the atlas is pre-allocated to avoid patch collision and guar- occupancy map, which has the end effect of adding points
antee temporal consistency. Further optimizations of the to the reconstructed point cloud.

Downloaded from https://www.cambridge.org/core. IP address: 121.6.171.175, on 05 Mar 2021 at 13:02:09, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/ATSIP.2020.12
8 d. graziosi et al.

4) Image padding, group dilation, re-coloring, Since the reconstructed geometry can be different from
and video compression the original one, TMC2 transfers the color from the origi-
For geometry images, TMC2 fills the empty space between nal point cloud to the decoded point cloud and uses these
patches using a padding function that aims to generate a new color values for transmission. The recoloring proce-
piecewise smooth image suited for higher efficiency video dure [47] considers the color value of the nearest point
compression. Each block of T × T (e.g. 16 × 16) pixels is from the original point cloud as well as a neighborhood
processed independently. If the block is empty (i.e. there of points closer to the reconstructed point to determine
are no valid depth values inside the block), the pixels of the a possible better color value. Once the color values are
block are filled by copying either the last row or column of known, TMC2 maps the color from 3D to 2D using the
the previous T × T block in raster scan order. If the block is same mapping applied to geometry. To pad the color image,
full (i.e. all pixels have valid depth values), padding is not TMC2 may use a procedure based on mip-map interpo-
necessary. If the block has both valid and non-valid pix- lation and to further improve padding by using a sparse
els, then the empty positions are iteratively filled with the linear optimization model [48], as visualized in Fig. 7. The
average value of their non-empty neighbors. The padding mip-map interpolation creates a multi-resolution represen-
procedure, also known as geometry dilation, is performed tation of the texture image guided by the occupancy map,
independently for each frame. However, the empty posi- preserving active pixels even when they are down-sampled
tions for both near and far maps are the same and using with empty pixels. A sparse linear optimization based on
similar values can improve compression efficiency. There- Gauss–Seidel relaxation can be used to fill in the empty
fore, a group dilation is performed, where the padded values pixels at each resolution scale. The lower resolutions are
of the empty areas are averaged, and the same value is used then used as initial values for the optimization of the up-
for both frames [43]. sampled images at higher scale. In this way, the background
A common raw format used as input for video encoding is smoothly filled with values that are similar to the edges of
is YUV420, with 8-bit luminance and sub-sampled chroma the patches. Furthermore, the same principle of averaging
channels. TMC2 packs the geometry image into the lumi- positions that were before empty areas of maps (group dila-
nance channel only, since geometry represented by distance tion for geometry images) can be used for the attributes as
has only one component. Also, careful consideration of the well. The sequence of padded images is then color converted
encoding GOP structure can also lead to significant com- from RGB444 to YUV420 and coded with traditional video
pression efficiency [44]. In the case of lossless coding, x-, y-, encoders.
and z- coordinates of points that could not be represented in
regular patches are block interleaved and are directly stored
in the luminance channel. Moreover, 10-bit profiles such as 5) Duplicates pruning, geometry smoothing,
the one in HEVC encoders [45] can improve accuracy and and attribute smoothing
coding performance. Additionally, hints for the placement The reconstruction process uses the decoded bitstreams for
of patches in the atlas can be given to the encoder to improve occupancy map, geometry, and attribute images to recon-
the motion estimation process [46]. struct the 3D point cloud. When TMC2 uses two maps, the

Fig. 7. Mip-map image texture padding with sparse linear optimization.

Downloaded from https://www.cambridge.org/core. IP address: 121.6.171.175, on 05 Mar 2021 at 13:02:09, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/ATSIP.2020.12
an overview of ongoing point cloud compression standardization activities 9

near and far maps, and the values from the two depth images IV. GEOMETRY-BASED POINT
are the same, TMC2 may generate several duplicate points. CLOUD CODING
This can have an impact on quality, as well as the creation
of unwanted points for lossless coding. To overcome this In the following sub-sections, we describe the coding prin-
issue, the reconstruction process is modified to create only ciple of G-PCC and explain some of its coding tools.
one point per (u,v) coordinate of the patch when the coordi-
nates stored in the near map and the far map are equal [49],
effectively pruning duplicate points. A) Geometry-based coding principle
The compression of geometry and attribute images and
While the V-PCC coding approach is based on 3D to 2D
the additional points introduced due to occupancy map
projections, G-PCC, on the contrary, encodes the content
subsampling may introduce artifacts, which could affect the
directly in 3D space. In order to achieve that, G-PCC utilizes
reconstructed point cloud. TMC2 can use techniques to
data structures, such as an octree that describes the point
improve the local reconstruction quality. Notice that simi-
locations in 3D space, which is explained with further detail
lar post-processing methods can be signaled and done at the
in the next section. Furthermore, G-PCC makes no assump-
decoder side. For instance, to reduce possible geometry arti-
tion about the input point cloud coordinate representation.
facts caused by segmentation, TMC2 may smooth the points
The points have an internal integer-based value, converted
at the boundary of patches using a process known as 3D
from a floating point value representation. This conver-
geometry smoothing [50]. One potential candidate method
sion is conceptually similar to voxelization of the input
for point cloud smoothing identifies the points at patch
point cloud, and can be achieved by scaling, translation, and
edges and calculates the centroid of the decoded points in a
rounding.
small 3D grid. After the centroid and the number of points
Another key concept for G-PCC is the definition of tiles
in the 2 × 2 × 2 grid is derived, a commonly used trilinear
and slices to allow parallel coding functionality [53]. In
filter is applied.
G-PCC, a slice is defined as a set of points (geometry and
Due to the point cloud segmentation process and the
attributes) that can be independently encoded and decoded.
patch allocation, which may place patches with differ-
In its turn, a tile is a group of slices with bounding box infor-
ent attributes near each other in the images, color values
mation. A tile may overlap with another tile and the decoder
at patch boundaries may be prone to visual artifacts as
can decode a partial area of the point cloud by accessing
well. Furthermore, coding blocking artifacts may create
specific slices.
reconstruction patterns that are not smooth across patch
One limitation of the current G-PCC standard is that
boundaries, creating visible seams artifacts in the recon-
it is only defined for intra prediction, that is, it does not
structed point cloud. TMC2 has the option to perform
currently use any temporal prediction tool. Nevertheless,
attribute smoothing [51] to reduce the seams effect on
techniques based on point cloud motion estimation and
the reconstructed point cloud. Additionally, the attribute
inter prediction are being considered for the next version
smoothing is performed in 3D space, to utilize the cor-
of the standard.
rect neighborhood when estimating the new smoothed
value.
B) G-PCC codec architecture: TMC13
6) Atlas compression Figure 8 shows a block diagram depicting the G-PCC ref-
Many of the techniques explained before have non- erence encoder [54], also known as TMC13, which will be
normative aspects, meaning that they are performed at the described in the next sections. It is not meant to repre-
encoder only, and the decoder does not need to be aware of sent TMC13’s complete set of functionalities, but only some
them. However, a couple of aspects, especially related to the of its core modules. First, one can see that geometry and
metadata that is used to reconstruct the 3D data from the 2D attributes are encoded separately. However, attribute cod-
video data, is normative and need to be sent to the decoder ing depends on decoded geometry. As a consequence, point
for proper processing. For example, the position and the ori- cloud positions are coded first.
entation of patches, the block size used in the packing, and Source geometry points may be represented by floating
others are all atlas metadata that need to be transmitted to point numbers in a world coordinate system. Thus, the first
the decoder. step of geometry coding is to perform a coordinate transfor-
The V-PCC standard defines an atlas metadata stream mation followed by voxelization. The second step consists of
based on NAL units, where the patch data information the geometry analysis using the octree or trisoup scheme,
is transmitted. The NAL unit structure is similar to the as discussed in Sections IV.B.1 and IV.B.2 below. Finally,
bitstream structure used in HEVC [3], which allows for the resulting structure is arithmetically encoded. Regard-
greater encoding flexibility. Description of normative syn- ing attributes coding, TMC13 supports an optional conver-
tax elements as well as the explanation of coding tech- sion from RGB to YCbCr [55]. After that, one of the three
niques for the metadata stream, such as inter predic- available transforming tools is used, namely, the Region
tion of patch data, can be found in the current standard Adaptive Hierarchical Transform (RAHT), the Predicting
specification [52]. Transform, and the Lifting Transform. These transforms are

Downloaded from https://www.cambridge.org/core. IP address: 121.6.171.175, on 05 Mar 2021 at 13:02:09, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/ATSIP.2020.12
10 d. graziosi et al.

Fig. 8. G-PCC reference encoder diagram.

Fig. 9. First two steps of an octree construction process.

discussed in Section IV.B.3. Following the transform, the 2) Surface approximation via trisoup
coefficients are quantized and arithmetically encoded. Alternatively, the geometry may be represented by a pruned
octree, constructed from the root to an arbitrary level where
1) Octree coding the leaves represent occupied sub-blocks that are larger than
The voxelized point cloud is represented using an octree a voxel. The object surface is approximated by a series of
structure [56] in a lossless manner. Let us assume that the triangles, and since there is no connectivity information
point cloud is contained in a quantized volume of D × D × that relates the multiple triangles, the technique is called
D voxels. Initially, the volume is segmented vertically and “triangle soup” (or trisoup). It is an optional coding tool
horizontally into eight sub-cubes with dimensions D/2 × that improves the subjective quality in lower bitrate as the
D/2 × D/2 voxels, as exemplified in Fig. 9. This process is quantization gives the rough rate adaptation. If trisoup is
recursively repeated for each occupied sub-cube until D is enabled, the geometry bitstream becomes a combination of
equal to 1. It is noteworthy to point out that in general only octree, segment indicator, and vertex position information.
1 of voxel positions are occupied [57], which makes octrees In the decoding process, the decoder calculates the intersec-
very convenient to represent the geometry of a point cloud. tion point between the trisoup mesh plane and the voxelized
In each decomposition step, it is verified which blocks are grid as shown in [59]. The number of the derived points
occupied and which are not. Occupied blocks are marked as in the decoder is determined by the voxel grid distance d,
1 and unoccupied blocks are marked as 0. The octets gen- which can be controlled [59] (Fig. 10).
erated during this process represent an octree node occu-
pancy state in 1-byte word and are compressed by an entropy 3) Attribute encoding
coder considering the correlation with neighboring octets. In G-PCC, there are three methods for attribute coding,
For the coding of isolated points, since there are no other which are: (a) RAHT [57]; (b) Predicting Transform [60];
points within the volume to correlate with, an alternative and (c) Lifting Transform [61]. The main idea behind
method to entropy coding the octets, namely Direct Cod- RAHT is to use the attribute values in a lower octree
ing Mode (DCM), is introduced [58]. In DCM, coordinates level to predict the values in the next level. The Predict-
of the point are directly coded without performing any com- ing Transform implements an interpolation-based hier-
pression. DCM mode is inferred from neighboring nodes in archical nearest-neighbor prediction scheme. The Lifting
order to avoid signaling the usage of DCM for all nodes of Transform is built on top of Predicting Transform but
the tree. has an extra update/lifting step. Because of that, from this

Downloaded from https://www.cambridge.org/core. IP address: 121.6.171.175, on 05 Mar 2021 at 13:02:09, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/ATSIP.2020.12
an overview of ongoing point cloud compression standardization activities 11

specific context, one method may be more appropriate than


the other. The common criterion that determines which
method to use is a combination of rate-distortion perfor-
mance and computational complexity. In the next two sec-
tions, RAHT and Predicting/Lifting attribute coding meth-
ods will be described.
RAHT Transform. The RAHT is performed by con-
sidering the octree representation of the point cloud. In
its canonical formulation [57], it starts from the leaves of
the octree (highest level) and proceeds backwards until it
reaches its root (lowest level). The transform is applied to
each node and is performed in three steps, one in each
x, y, and z directions, as illustrated in Fig. 11. At each
step, the low-pass gn and high-pass hn coefficients are
generated.
Fig. 10. Trisoup point derivation at the decoder. RAHT is a Haar-inspired hierarchical transform. Thus, it
can be better understood if a 1D Haar transform is taken as
point forward they will be jointly referred to as Predict- an initial example. Consider a signal v with N elements. The
ing/Lifting Transform. The user is free to choose either Haar decomposition of v generates g and h, which are the
of the above-mentioned transforms. However, given a low-pass and high-pass components of the original signal,

Fig. 11. Transform process of a 2 × 2 × 2 block.

Fig. 12. Example of upconverted transform domain prediction in RAHT.

Downloaded from https://www.cambridge.org/core. IP address: 121.6.171.175, on 05 Mar 2021 at 13:02:09, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/ATSIP.2020.12
12 d. graziosi et al.

Fig. 13. Levels of detail generation process.

each one with N/2 elements. The n-th coefficients of g and over RAHT formulation without prediction show signifi-
h are calculated using the following equation: cant improvements in a rate-distortion sense (up to around
     30 overall average gains for color and 16 for reflectance).
gn 1 1 1 v2n Predicting/Lifting Transform. The Predicting Transform
=√ . (2)
hn 2 −1 1 v2n+1 is a distance-based prediction scheme for attribute coding
[60]. It relies on a Level of Detail (LoD) representation that
The transform can be performed recursively taking the distributes the input points in sets of refinements levels (R)
current g as the new input signal v, and at each recursion using a deterministic Euclidean distance criterion. Figure 13
the number of low-pass coefficients is divided by a factor shows an example of a sample point cloud organized in its
of 2. The g component can be interpreted as a scaled sum of original order, and reorganized into three refinement levels,
equal-weighted consecutive pairs of v, and the h component as well as the correspondent Levels of Details (LoD0, LoD1,
as their scaled difference. However, if one chooses to use the and LoD2). One may notice that a level of detail l is obtained
Haar transform to encode point clouds, the transform must by taking the union of refinement levels for 0 to l.
be modified to take the sparsity of the input point cloud into The attributes of each point are encoded using a pre-
account. This can be accomplished by allowing the weights diction determined by the LoD order. Using Fig. 13 as an
to adapt according to the distribution of points. Hence, the illustration, consider LoD0 only. In this specific case, the
recursive implementation of the RAHT can be defined as attributes of P2 can be predicted by the reconstructed ver-
follows: sions of its nearest neighbors, P4, P5, or P0, or by a distance-
 l  l+1   √ √  based weighted average of these points. The maximum
gn g2n 1 w1 w2 number of prediction candidates can be specified, and the
=T , T=√ √ √ ,
hln l+1
h2n+1 w1 + w2 − w2 w1 number of nearest neighbors is determined by the encoder
(3) for each point. In addition, a neighborhood variability anal-
ysis [60] is performed. If the maximum difference between
wnl = w1 + w2 , (4) any two attributes in the neighborhood of a given point P is
w1 = w2n
l+1
, w2 = w2n+1
l+1
, (5) higher than a threshold, a rate-distortion optimization pro-
cedure is used to control the best predictor. By default, the
where l is the decomposition level, w1 and w2 are the weights attribute values of a refinement level R(j) are always pre-
associated with the g2n l+1 l+1
and g2n+1 low-pass coefficients at dicted using the attribute values of its k-nearest neighbors
level l + 1, and wn is the weight of the low-pass coefficient gnl
l in the previous LoD, that is, LoD(j–1). However, predic-
at level l. As a result, higher weights are applied to the dense tion within the same refinement level can be performed by
area points so that the RAHT can balance the signals in the setting a flag to 1 [64], as shown in Fig. 14.
transform domain better than the non-adaptive transform. The Predicting Transform is implemented using two
A fixed-point formulation of RAHT was proposed in operators based on the LoD structure, which are the split
[62]. It is based on matrix decompositions and scaling of and merge operators. Let L(j) and H(j) be the sets of
quantization steps. Simulations showed that the fixed-point attributes associated with LoD(j) and R(j), respectively. The
implementation can be considered equivalent to its floating- split operator takes L(j + 1) as an input and returns the low-
point counterpart. resolution samples L(j) and the high-resolution samples
Most recently, a transform domain prediction in RAHT
has been proposed [63] and is available in the current test
model TMC13. The main idea is that for each block, the
transformed upconverted sum of attributes at level d, cal-
culated from the decoded sum of attributes at d–1, is used
as a prediction to the transformed sum of attributes at level
d, generating high-pass residuals that can be further quan-
tized and entropically encoded. The upconverting process
is accomplished by means of a weighted average of neigh-
boring nodes. Figure 12 shows a simplified illustration of
the previously described prediction scheme. Reported gains Fig. 14. LoD referencing scheme.

Downloaded from https://www.cambridge.org/core. IP address: 121.6.171.175, on 05 Mar 2021 at 13:02:09, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/ATSIP.2020.12
an overview of ongoing point cloud compression standardization activities 13

Fig. 15. Forward and Inverse Predicting Transform.

H(j). The merge operator takes L(j) and H(j) and returns A) V-PCC performance
L(j + 1). The predicting scheme is illustrated in Fig. 15.
Here we show the compression performance of TMC2 to
Initially, the attributes signal L(N + 1), which represents
encode dynamic point cloud sequences, such as Longdress,
the whole point cloud, is split into H(N) and L(N). Then
Red and Black, and Soldier sequences [1]. These are com-
L(N) is used to predict H(N) and the residual D(N) is cal-
posed of a 10 s point cloud sequence of a human subject at
culated. After that, the process goes on recursively. The
30 frames per second. Each point cloud frame has approx-
reconstructed attributes are obtained through the cascade
imately 1 million points, with voxelized positions and RGB
of merge operations.
color attributes. Since the beginning of the standardization
The Lifting Transform [61], represented in the diagram
process, the reference software performance has improved
of Fig. 16, is built on top of the Predicting Transform. It
significantly, as shown in the graphs depicted by Fig. 18. For
introduces an update operator and an adaptive quantiza-
example, for the longdress sequence, the gain achieved with
tion strategy. In the LoD prediction scheme, each point is
TMC2v8.0 compared against TMC2v1.0 is about 60 sav-
associated with an influence weight. Points in lower LoDs
ings in BD-rate for the point-to-point metric (D1) and about
are used more often and, therefore, impact the encoding
40 savings in BD-rate for the luma attribute.
process more significantly. The update operator determines
Both point-to-point and point-to-plane metrics show
U(j) based on the residual D(j) and then updates the value
the efficiency of the adopted techniques. Furthermore,
of L(j) using U(j), as shown in Fig. 16. The update signal
improvement in the RD performance for attributes can also
U(j) is a function of the residual D(j), the distances between
be verified. As TMC2 relies on the existing video codec
the predicted point and its neighbors, and their correspon-
for the compression performance, the improvement mostly
dent weights. Finally, to guide the quantization processes,
comes from the encoder optimization such as a patch allo-
the transformed coefficients associated with each point are
cation among frames in time consistency manner [39], or
multiplied by the square root of their respective weights.
motion estimation guidance for the video coder from the
patch generator [46].
Figure 19 provides a visualization of the coded sequences
V. POINT CLOUD COMPRESSION using MPEG PCC Reference Rendering Software [67]. With
PERFORMANCE the latest TMC2 version 8, the 10–15 Mbps bitstream pro-
vides a good visual quality over the 70 dB while the initial
In this section, we will show the performance of V- TM v1 needed 30–40 Mbps for the same quality.
PCC for AR/VR use case, that is, coding dynamic point
cloud sequences, and G-PCC for heritage collection and
autonomous driving. Geometry and attributes are encoded
using the latest reference software (TMC2v8.0 [25] and
B) G-PCC performance
TMC13v7.0 [54]) following the common test conditions There are different encoding scenarios involving geom-
(CTC, [65]) set by the MPEG group. Furthermore, dis- etry representation (octree or trisoup), attributes trans-
tortions due to compression artifacts are measured with form (RAHT, Predicting, or Lifting Transforms), as well
the pc_error tool [66], where in addition to geometric as compression methods (lossy, lossless, and near-lossless).
distortion from point-to-point (D1) and point-to-plane Depending on the use case, a subset of these tools is selected.
(D2), attribute distortion in terms of PSNR and bitrates In this section, we show a comparison between RAHT and
are also reported. The part of the PCC dataset that Predicting/Lifting Transforms, both using lossless octree
is used in our simulations is depicted in Fig. 17. The geometry and lossy attribute compression schemes. Results
results presented here can be reproduced by following the for two point clouds that represent specific use cases are
CTC [65]. presented. The first point cloud (head_00039_vox12, about

Downloaded from https://www.cambridge.org/core. IP address: 121.6.171.175, on 05 Mar 2021 at 13:02:09, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/ATSIP.2020.12
14 d. graziosi et al.

Fig. 16. Forward and Inverse Predicting/Lifting Transform.

Fig. 17. Point cloud test set, (a) Longdress, (b) Red and Black, (c) Soldier, (d) Head, (e) Ford.

(a) (b) (c)

Fig. 18. TMC2v1.0 versus TMC2v8.0, (a) Point-to-point geometry distortion (D1), (b) Point-to-plane geometry distortion (D2) (c), Luma attribute distortion.

Fig. 19. Subjective quality using PCC Rendering Software, (a) Longdress at 2.2 Mbps, (b) Longdress at 10.9 Mbps, (c) Longdress at 25.2 Mbps.

Downloaded from https://www.cambridge.org/core. IP address: 121.6.171.175, on 05 Mar 2021 at 13:02:09, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/ATSIP.2020.12
an overview of ongoing point cloud compression standardization activities 15

(a) (b)

Fig. 20. Examples of G-PCC coding performance.

14 million points) is an example of cultural heritage appli- work has been the joint participation of both video cod-
cation. In this use case, point clouds are dense, and color ing and computer graphics experts, working together to
attribute preservation is desired. The second point cloud provide a 3D codec standard incorporating state-of-the-
(ford_01_q1 mm, about 80 thousand points per frame) is art technologies from both 2D video coding and computer
in fact a sequence of point clouds, and represents the graphics.
autonomous driving application. In this use case, point In this paper, we presented the main concepts related to
clouds are very sparse, and in general reflectance informa- PCC, such as point cloud voxelization, 3D to 2D projec-
tion preservation is desired. Figure 20(a) shows luminance tion, and 3D data structures. Furthermore, we provided an
PSNR plots (Y-PSNR) for head_00039_vox12 encoded with overview of the coding tools used in the current test models
RAHT (with and without transform domain prediction) for V-PCC and G-PCC standards. For V-PCC, topics such
and Lifting Transform. Figure 20(b) shows reflectance as patch generation, packing, geometry, occupancy map,
PSNR plots (R-PSNR) for ford_01_q1 mm encoded with the attribute image generation, and padding were discussed.
same attribute encoders. As stated before, in both cases, For G-PCC, topics such as octree, trisoup geometry coding,
geometry is losslessly encoded. and 3D attribute transforms such as RAHT and Predict-
PSNR plots show that for head_00039_vox12, RAHT ing/Lifting were discussed. Some illustrative results of the
with transform domain prediction outperforms RAHT compression performance of both encoding architectures
without prediction and the Lifting Transform in terms were presented and results indicate that these tools achieve
of luminance rate-distortion. For ford_01_q1 mm, RAHT the current state-of-the-art performance in PCC, consid-
outperforms Lifting; however, turning off the transform ering use cases like dynamic point cloud sequences, world
domain prediction yields slightly better results. heritage collection, and autonomous driving.
As for lossless geometry and lossless attribute compres- It is expected that the two PCC standards provide com-
sion, Predicting Transform is used. In this case, considering petitive solutions to a new market, satisfying various appli-
both geometry and attributes, the compression ratios for cation requirements or use cases. In the future, MPEG is
head_00039_vox12 and ford_01_q1 mm are around 3 to 1 also considering extending the PCC standards to new use
and 2 to 1, respectively. Taking all the point clouds in the cases, such as dynamic mesh compression for V-PCC, and
CTC into account, the average compression ratio is around including new tools, such as inter coding for G-PCC. For
3 to 1. A comparison with the open-source 3D codec Draco the interested reader, the MPEG PCC website [70] provides
[68] was performed in [69]. For the same point clouds, the further informative resources as well as the latest test model
observed compression ratios for Draco were around 1.7 in software for V-PCC and G-PCC.
both cases. Again, considering all the CTC’s point clouds,
the average compression ratio for Draco is around 1.6 to 1.
In summary, current TMC13 implementation has outper-
REFERENCES
formed Draco by a factor of 2 to 1 for lossless compression.
[1] Tudor P.N.: MPEG-2 video compression tutorial, IEEE Colloquium on
MPEG-2 – What it is and What it isn’t, London, UK, 1995, 2/1–2/8.
VI. CONCLUSION [2] Richardson I.: The H.264 Advance Video Compression Standard.
Vcodex Ltd., 2010.
Recent progress in 3D capture technologies has provided [3] Sullivan G.J.; Ohm J.-R.; Han W.-J.; Wiegand T.: Overview of the high
tremendous opportunities for capture and generation of efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst.
3D visual data. A unique aspect of PCC standardization Video Technol, 22 (12) (2012), 1649–1668.

Downloaded from https://www.cambridge.org/core. IP address: 121.6.171.175, on 05 Mar 2021 at 13:02:09, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/ATSIP.2020.12
16 d. graziosi et al.

[4] MPEG 3DG, Call for proposals for point cloud compression v2, [28] MPEG 3DG, V-PCC codec description, ISO/IEC JTC1/SC29/WG11
ISO/IEC JTC 1/SC 29/WG 11 N16763, 2017. N18892, 2019.
[5] Scharstein D.; Szeliski R.: A taxonomy and evaluation of dense two- [29] Hoppe H.; DeRose T.; Duchamp T.; McDonald J.; Stuetzle W.: Surface
frame stereo correspondence algorithms. Int. J. Comput. Vision, 47 reconstruction from unorganized points, in Proc. 19th annual Conf.
(2002), 7–42. Computer Graphics and Interactive Techniques, July 1992.
[6] Kim H.K.I.; Kogure K.; Sohn K.: A real-time 3D modeling sys- [30] Guede C.; Cai K.; Ricard J.; Llach J.: Geometry image coding improve-
tem using multiple stereo cameras for free-viewpoint video genera- ments, ISO/IEC JTC1/SC29/WG11 M42111, 2018.
tion. Lecture Notes Comput. Sci. Image Anal. Recognit., 4142 (2006),
[31] Guede C.; Ricard J.; Llach J.: Spatially adaptive geometry and texture
237–249.
interpolation, ISO/IEC JTC1/SC29/WG11 M43658, 2018.
[7] Zhang Z.: Microsoft Kinect sensor and its effect. IEEE MultiMedia, 19
[32] Dawar N.; Najaf-Zadeh H.; Joshi R.; Budagavi M.: PCC TMC2 Inter-
(2) (2012), 4–10.
leaving in geometry and texture layers, ISO/IEC JTC1/SC29/WG11
[8] Apple. Accessed: January 21, 2020. [Online]. Available: www.apple. M43723, 2018.
com.
[33] Cai K.; Llach J.: Signal multiple points along one projection line in
[9] Intel Realsense. Accessed: January 21, 2020. [Online]. Available: PCC TMC2 lossless mode, ISO/IEC JTC1/SC29/WG11 M42652, 2018.
https://www.intelrealsense.com/.
[34] Kuma S.; Nakagami O.: PCC TMC2 with additional projection plane,
[10] Sony Depthsensing Solutions. Accessed: January 21, 2020. [Online]. ISO/IEC JTC1/SC29/WG11 M43494, 2018.
Available: https://www.sony-depthsensing.com.
[35] Rhyu S.; Oh Y.; Budagavi M.; Sinharoy I.: [PCC] TMC2 Surface
[11] Accessed: January 21, 2020. [Online]. Available: www.8i.com. separation for video encoding efficiency, ISO/IEC JTC1/SC29/WG11
M43668, 2018.
[12] Intel Newsroom. Accessed: January 21, 2020. [Online]. Available:
https://newsroom.intel.com/news/huge-geodesic-dome-worlds-larg [36] Rhyu S.; Oh Y.; Budagavi M.; Sinharoy I.: [PCC] TMC2 Projection
est-360-degree-movie-set/#gs.a1kz65. directions from bounding box, ISO/IEC JTC1/SC29/WG11 M43669,
2018.
[13] Innovation Studios. Accessed: January 21, 2020. [Online]. Available:
https://www.deloittedigital.com/us/en/offerings/innovation-studios. [37] Graziosi D.: [PCC] TMC2 Patch flexible orientation, ISO/IEC
html. JTC1/SC29/WG11 M43680, 2018.
[14] Nurulize. Accessed: January 21, 2020. [Online]. Available: https:// [38] Zhang D.: A new patch side information encoding method for PCC
www.sonypictures.com/corp/press_releases/2019/0709/sonypictures TMC2, ISO/IEC JTC1/SC29/WG11 M42195, 2018.
accquiresvirtualproductionsoftwarecompanynurulize.
[39] Graziosi D.; Tabatabai A.: [V-PCC] New contribution on patch pack-
[15] Chou P.: et al.: 8i Voxelized Full Bodies – a voxelized point cloud ing, ISO/IEC JTC1/SC29/WG11 M47499, 2019.
dataset, ISO/IEC JTC 1/SC 29/WG 11 m40059, 2017.
[40] Kim J.; Tourapis A.; Mammou K.: Patch precedence for low delay V-
[16] Rufael Mekuria (CWI), Kees Blom (CWI), Pablo Cesar (CWI), Point PCC decoding, ISO/IEC JTC1/SC29 WG11 M47826, 2019.
Cloud Codec for Tele-immersive Video, ISO/IEC JTC 1/SC 29/WG 11
[41] Valentin V.; Mammou K.; Kim J.; Robinet F.; Tourapis A.; Su Y.: Pro-
m38136, 2016.
posal for improved occupancy map compression in TMC2, ISO/IEC
[17] Flynn D.; Lasserre S.: PCC Cat3 test sequences from BlackBerry| JTC1/SC29/WG11 M42639, 2018.
QNX, ISO/IEC JTC 1/SC 29/WG 11 m43647, 2018.
[42] Joshi R.; Dawar N.; Budagavi M.: [V-PCC] [New Proposal] On occu-
[18] Tulvan C.; Preda M.: Point cloud compression for cultural objects, pancy map compression, ISO/IEC JTC1/SC29/WG11 M46049, 2019.
ISO/IEC JTC 1/SC 29/WG 11 m37240, 2015.
[43] Rhyu S.; Oh Y.; Sinharoy I.; Joshi R.; Budagavi M.: Texture padding
[19] Schwarz S.; Personen M.: Real-time decoding and AR playback of the improvement, ISO/IEC JTC1/SC29/WG11 M42750, 2018.
emerging MPEG video-based point cloud compression standard, IBC
[44] Zakharchenko V.; Solovyev T.: [PCC] TMC2 modified random access
2019.
configuration, ISO/IEC JTC1/SC29/WG11 M42719, 2018.
[20] Ricard J.; Guillerm T.; Olivier Y.; Guede C.; Llach J.: Mobiles device
[45] Litwic L.: 10-bit coding results, ISO/IEC JTC1/SC29/WG11 M43712,
decoder and considerations for profiles definition, ISO/IEC JTC 1/SC
2018.
29/WG 11 m49235, 2019.
[46] Li L.; Li Z.; Zakharchneko V.; Chen J.: [V-PCC] [New contribu-
[21] Curless B.; Levoy M.: A volumetric method for building complex
tion] Motion Vector prediction improvement for point cloud coding,
models from range images, in Proc. 23rd Annual Conf. Computer
ISO/IEC JTC1/SC29 WG11 Doc. m44941, 2018.
Graphics and Interactive Techniques, August 1996.
[47] Kuma S.; Nakagami O.: PCC CE1.3 Recolor method, ISO/IEC
[22] Ullman S.: (1979). The interpretation of structure from motion, in
JTC1/SC29/WG11 M42538, 2018.
Proc. Royal Society of London. Series B. Biological Sciences, January
1979. [48] Graziosi D.: V-PCC New Proposal (related to CE2.12): Harmonic
background filling, ISO/IEC JTC1/SC29/WG11 M46212, 2019.
[23] MPEG Requirements, requirements for point cloud compression,
ISO/IEC JTC 1/SC 29/WG 11 N16330, 2016. [49] Ricard J.; Guede C.; Llach J.; Olivier Y.; Chevet J.-C.; Gendron
D.: [VPCC-TM] New contribution on duplicate point avoidance in
[24] Schwarz S. et al.: An emerging point cloud compression standard.
TMC2, ISO/IEC JTC1/SC29/WG11 M44784, 2018.
IEEE J. Emerg. Select. Topics Circ. Syst., 9 (1) (2019), 133–148.
[50] Nakagami O.: PCC TMC2 low complexity geometry smoothing,
[25] MPEG 3DG, V-PCC Test Model v8, ISO/IEC JTC1/SC29/WG11
ISO/IEC JTC1/SC29/WG11 M43501, 2018.
N18884, 2019.
[51] Joshi R.; Najaf-Zadeh H.; Budagavi M.: [V-PCC] On attribute
[26] Gu X.; Gortler S.; Hoppe H.: Geometry images, in Proc. 29th Annual
smoothing, ISO/IEC JTC1/SC29/WG11 M51003, 2019.
Conf. Computer Graphics and Interactive Techniques, July 2002.
[52] MPEG 3DG, Text of ISO/IEC DIS 23090-5 Video-based point cloud
[27] Sander P.; Wood Z.; Gortler S.; Snyder J.; Hoppe H.: Multi-chart
compression, ISO/IEC JTC1/SC29/WG11 N18670, 2019.
geometry images, Symp. Geometry Processing, June 2003.

Downloaded from https://www.cambridge.org/core. IP address: 121.6.171.175, on 05 Mar 2021 at 13:02:09, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/ATSIP.2020.12
an overview of ongoing point cloud compression standardization activities 17

[53] Shao Y.; Jin J.; Li G.; Liu S.: Report on point cloud tile and slice based Ohji Nakagami received the B.Eng. and M.S. degrees in elec-
coding, ISO/IEC JTC1/SC29/WG11 M48892, 2019. tronics and communication engineering from Waseda Univer-
[54] MPEG 3DG, G-PCC codec description v4, ISO/IEC JTC1/SC29/ sity, Tokyo, Japan, in 2002 and 2004, respectively. He has been
WG11 N18673, 2019. with Sony Corporation, Tokyo, Japan, since 2004. Since 2011, he
has been with the ITU-T Video Coding Experts Group and the
[55] Recommendation ITU-R BT.709-6, Parameter values for the HDTV
ISO/IEC Moving Pictures Experts Group, where he has been
standards for production and international programme exchange.
2015.
contributing to video coding standardization. He is an Editor
of the ISO/IEC committee draft for video-based point cloud
[56] Meagher D.: Geometric modeling using octree encoding. Comput. compression (ISO/IEC 23090-5) and the ISO/IEC working
Graph. Image Process., 19 (1981), 129–147.
draft for geometry-based point cloud compression (ISO/IEC
[57] de Queiroz R.L.; Chou P.A.: Compression of 3D point clouds using 23090-9).
a region-adaptive hierarchical transform. IEEE Trans. Image Process-
ing, 25 (8) (2016), 3497–3956.
Satoru Kuma received the B.Eng. and M.Eng. degrees in infor-
[58] Lassere S.; Flynn D.: Inference of a mode using point location direct mation and system information engineering from Hokkaido
coding in TMC3, ISO/IEC JTC1/SC29/WG11 M42239, 2018. University, Japan, in 1998 and 2000, respectively. He works at
[59] Nakagami O.: Report on Triangle soup decoding, ISO/IEC JTC1/ Sony Corporation, Tokyo, Japan. His research interests include
SC29/WG11 m52279, 2020. video/image processing, video compression and point cloud
compression.
[60] MPEG 3DG, PCC Test Model Category 3 v0. ISO/IEC JTC1/SC29/
WG11 N17249, 2017.
Alexandre Zaghetto received his B.Sc. degree in electronic
[61] Mammou K.; Tourapis A.; Kim J.; Robinet F.; Valentin V.; Su
engineering from the Federal University of Rio de Janeiro
Y.: Proposal for improved lossy compression in TMC1, ISO/IEC
(2002), his M.Sc. degree in 2004, and his Dr. degree in 2009
JTC1/SC29/WG11 M42640, 2018
from the University of Brasilia (UnB), both in electrical engi-
[62] Sandri G.P.; Chou P.A.; Krivokuća M.; de Queiroz R.L.: Integer alter- neering. Previous to his current occupation, he was an Asso-
native for the region-adaptive hierarchical transform. IEEE Signal ciate Professor at the Department of Computer Science/UnB.
Processing Letters, 26 (9) (2019), 1369–1372.
Presently, Dr. Zaghetto works at Sony’s R&D Center San Jose
[63] Flynn D.; Lasserre S.: Report on up-sampled transform domain pre- Laboratory, California, as a Senior Applied Research Engi-
diction in RAHT, ISO/IEC JTC1/SC29/WG11 M49380, 2019. neer. He is a Senior Member of IEEE and his research inter-
[64] MPEG 3DG, CE 13.6 on attribute prediction strategies, ISO/IEC ests include, but are not limited to, digital image and video
JTC1/SC29/WG11 N18665, 2018. processing, and point cloud compression.
[65] MPEG 3DG, Common test conditions for point cloud compression,
ISO/IEC JTC1/SC29/WG11 N18665, 2019. Teruhiko Suzuki received the B.Sc. and M.Sc. degrees in
physics from Tokyo Institute of Technology, Japan, in 1990
[66] Tian D.; Ochimizu H.; Feng C.; Cohen R.; Vetro A.: Evaluation met-
and 1992, respectively. He joined Sony Corporation in 1992.
rics for point cloud compression, ISO/IEC JTC1/SC29/WG11 m39316,
He worked for research and development for video compres-
2016.
sion, image processing, signal processing, and multimedia sys-
[67] MPEG 3DG, User manual for the PCC Rendering software. ISO/IEC tems. He also participated in the standardization in ISO/IEC
JTC1/SC29/WG11 N16902, 2017. JTC1/SC29/WG11 (MPEG), ISO/IEC JTC1, ITU-T, JVT, JCT-
[68] Draco. Accessed: January 21, 2020. [Online]. Available: https://github. VC, JVET, SMPTE, and IEC TC100.
com/google/draco.
[69] Flynn D.; Lasserre S.: G-PCC EE13.4 Draco performance comparison, Ali Tabatabai (LF’17) received the bachelor’s degree in elec-
ISO/IEC JTC1/SC29/WG11 M49383, 2019. trical engineering from Tohoku University, Sendai, Japan, and
the Ph.D. degree in electrical engineering from Purdue Univer-
[70] MPEG-PCC. Accessed: January 21, 2020. [Online]. Available: http://
www.mpeg-pcc.org. sity. He was the Vice President of the Sony US Research Center,
where he was responsible for research activities related to cam-
Danillo B. Graziosi received the B.Sc., M.Sc., and D.Sc. era signal processing, VR/AR capture, and next-generation
degrees in electrical engineering from the Federal University video compression. He is currently a Consultant and a Tech-
of Rio de Janeiro, Rio de Janeiro, Brazil, in 2003, 2006, and nical Advisor to the Sony US Research Center and the Sony
2011, respectively. He is currently the Manager of the Next- Tokyo R&D Center. He is an Editor of the ISO/IEC commit-
Generation Codec Group at Sony’s R&D Center US San Jose tee draft for video-based point cloud compression (ISO/IEC
Laboratory. His research interests include video/image pro- 23090-5).
cessing, light fields, and point cloud compression.

Downloaded from https://www.cambridge.org/core. IP address: 121.6.171.175, on 05 Mar 2021 at 13:02:09, subject to the Cambridge Core terms of use, available at
https://www.cambridge.org/core/terms. https://doi.org/10.1017/ATSIP.2020.12

S-ar putea să vă placă și