3D Video Streaming

1
3D VIDEO STREAMING

Report by: Ashok Banjara
Roll No.-13EC64R21
Design and Simulation Lab
Telecommunication Systems Engineering
Department of E & ECE, IIT Kharagpur
2

CONTENTS

S. No. Topic Name Page Number
1) Abstract. 3
2) Introduction 4
3) State of the art.... 8
a. Three-dimensional video formats. 8
b. Three-dimensional video coding. 13
c. Transport protocols. 15
d. Adaptive streaming.. 17
e. P2P streaming.. 17
4) Conclusions.. 19
5) References. 20

3

ABSTRACT

Three-dimensional (3-D) video is the next natural step in the evolution of digital media technologies.
Stereoscopic TV broadcasts have already begun in frame-compatible format for stereoscopic 3D. Recent
3-D autostereoscopic displays can display multitier video with up to 200 views. While it is possible to
broadcast 3-D stereo video (two views) over digital TV platforms today, streaming over Internet Protocol
(IP) provides a more flexible approach for distribution of stereo and free-view 3-D media to home and
mobile with different connection bandwidths and different 3-D displays. 3D video is destined to be
available in homes and mobile devices. The natural next step is to deliver 3D content in the form of
multiview video (MVV) that enables a natural glasses-free 3D experience. Unfortunately, the number of
views needed to drive multiview displays varies depending on the price vs. quality trade-off. Therefore,
the bitrate requirement of MVV content changes according to users display technology, making
transmission over fixed bit rate channels inefficient. IP provides a flexible transport mechanism for 3D
content; however, well known problems such as fluctuations in available link capacity and varying
transmission delays pose challenges to 3D video services over the Internet.

4

1. INTRODUCTION

Stereoscopic 3D has had a significant impact on the movie industry, and public interest in 3D content
has increased over the last decade. At present, broadcast standards exist for stereoscopic 3D in a frame
compatible format, where frames are subsampled to keep the size the same as in conventional 2D
video. Although frame compatible formats have created a seamless solution for transmission of
stereoscopic 3D video, there are serious drawbacks with this approach. First, the perceived visual quality
can be inferior compared to 2D video because the resolution of individual views is lower. Second, the
necessity of wearing glasses is a burden on viewers. There are alternative solutions that use lenticular
sheet technology; however, such autostereoscopic displays have sweet spots, which are narrow and
difficult to keep aligned all the time. With wide availability of low cost stereo cameras, 3-D displays, and
broadband communication options, 3-D media is destined to move from the movie theater to home and
mobile platforms. Finally, the viewing angle of stereoscopic 3D is fixed to a single point of view, and
users do not have the free-view capability that presents a scene from different perspectives. Therefore,
3D is destined to be available in homes and mobile devices in the form of multiview video (MVV).
In the near term, popular 3-D media will most likely be in the form of stereoscopic and multiview video
with associated spatial audio. Transmission of 3-D media, via broadcast or on-demand, to end users with
varying 3-D display terminals (e.g., TV, laptop, and mobile devices) and bandwidths is one of the biggest
challenges to bring 3-D media to the home and mobile devices.
Multiview displays allow users to experience natural depth perception without wearing special glasses
and can present a scene from multiple angles. With each additional view, it is possible to cover a wider
range until the displays limitation is reached. Therefore, the required number to drive multiview auto-
stereoscopic displays is not fixed but depends on the display technology (based on the price vs. quality
trade-off). The requirement to transmit varying numbers of views according to users display capability
is a key challenge in the transmission of MVV contents because the traditional broadcast standards such
as digital video broadcasting (DVB) operate over fixed bit rate channels, assuming that there exists an
upper bound for the bit rate of the content. This is not the case with multiview video.
There are two main platforms for 3-D video delivery: digital television (DTV) platforms and the Internet
Protocol (IP) platform, as depicted in Fig. 1. There are already broadcasters, who started 3DTV
broadcasts using a DTV platform. For example, digital video broadcasting (DVB) is a suite of open
standards for DTV, which has already been used to broadcast stereo video using frame-compatible
formats. However, DTV platforms are not well suited to transmit multiview content with variable
number of views to accommodate different 3-D display technologies. On the other hand, the IP
platform, with applications such as IPTV and WebTV, provides a more flexible channel to transmit as
many views as required by the user display terminal and at a quality level allowed by the bandwidth of
each user.
5

Fortunately, the Internet can serve as a flexible platform to deliver MVV as it naturally supports varying
bit rates. With simple modifications, IPTV and WebTV applications can serve as many views as needed.
Moreover, with the increasing availability of mobile platforms, the IP network can serve as a medium to
deliver MVV to home and mobile users. However, due to the large bandwidth requirement of MVV
contents, it may become difficult to achieve service scalability against increasing numbers of recipients.
Therefore, the classical server-client model may not adequately address all the challenges of MVV
delivery over IP. On the other hand, peer-to-peer (P2P) overlays can distribute the task of data
forwarding over peers and alleviate the problem of high bandwidth requirements.
International Telecommunications Union (ITU) defines IPTV as multimedia services delivered over IP-
based managed networks that provide the required level of quality of service (QoS) and experience,
security, interactivity, and reliability. On the other hand, WebTV services are offered over Internet
connections that support best effort delivery with no QoS guarantees, making them accessible anytime,
anywhere as opposed to IPTV, which is limited by the service providers infrastructure. Unlike traditional
broadcast, IP services are offered at varying speeds and costs over a variety of physical infrastructures,
such as fixed or wireless telecommunications networks. Furthermore, it is possible to provide a variety
Fig. 1. Platforms for 3-D
media transport.
6

of service architectures such as serverclient (unicast) or peer-to peer (multicast) using different
transport protocol options, such as HTTP/TCP or RTP/UDP, over the IP platform. Hence, 3-D video
encoding methods that offer functionalities such as rate scalability, resolution scalability, view
scalability, view selectivity, and packet-loss resilience, without a significant sacrifice from encoding
efficiency become a key requirement in order to take full advantage of the flexibility that the IP platform
provides. In order to provide the best end-to-end quality of user experience, 3-D video encoding
methods and transport mechanisms must be jointly optimized, considering the available network rate,
the end-user display terminal, and possibly the human perception of stereoscopy.

IP does not guarantee quality of service and poses serious challenges to video streaming applications
that must deliver time-sensitive multimedia content at a desired quality of experience (QoE). Adaptive
streaming is a key technology to handle IP artifacts such as varying link capacity as it can match source
video rate to available capacity. Rate adaptation is more essential in P2P video streaming because peers
have limited upload capacities, making connections more prone to rate fluctuations.
At present, 3DTV can be considered the logical next step for complementing HDTV to incorporate 3D
perception into the viewing experience. Multiview Video (MVV) systems have gained significant interest
recently, particularly in terms of view synthesis approaches. View synthesis usually falls in two
categories: Geometry-Based Rendering (GBR) and Image-Based Rendering (IBR). Typically, GBR exploits
the 3D geometric texture knowledge of the scene, which requires 3D models of the objects. However,
such models require millions of polygons, complex lighting models, extensive texture mapping, and
great computational cost. IBR techniques have received attention as an attractive alternative to GBR for
view synthesis. Instead of 3D geometric primitives, a collection of images are used to generate other
images. Among a variety of IBR techniques, the Layered Depth Image (LDI) is one of the most efficient
synthesizing view methods for complex 3D scenes.
An extension of LDI representation called Layered Depth Video (LDV) has been proposed as a 3D video
data representation. LDV is considered suitable associated 3D video data representation that provides
one full or central view as a main layer, and additional enhancement layers that include residual texture
and depth data to represent the side views. LDV is then both a derivative of and an alternative to
Multiview Video-plus-Depth (MVD) representation: it only transmits one full view (with associated
residual data) over the channel, and afterwards, the non-transmitted side views are generated by view
synthesis as a view transfer between the central and side views. The central view is then projected onto
side views by IBR. The problem, however, in that every pixel does not necessarily exist in every view,
which results in the occurrence of holes when the central view is projected. View synthesis then exposes
the parts of the scene that are occluded in the central view and make them visible in the side views. This
is a process known as disocclusion.

One way of dealing with these disocclusions would be to rely on pre-processing the depth video to allow
the reduction of depth data discontinuities in a way that decreases the disocclusions. However, this
would mean introducing filtering-induced distortion to the depth video, which would reduce the users
original depth perception. It is possible to remove disocclusions by considering more complex multi-
dimensional data representations, such as LDV data representation, that allow the storage of additional
depth and color values for pixels that are occluded in the central view. This extra data provides the
necessary information to fill in disoccluded areas in rendered, novel views.
7

One solution suggested by Tauber et al. consists of combining IBR with inpainting techniques to deal
with large disocclusions, due to the natural similarity between damaged holes inpaintings and
disocclusions in view synthesis. Image inpainting, also known as image completion, fills in pixels in a
large missing region with the information derived from the pixels that surround it.

A more recent research direction is to consider a combination of DVB and IP platforms to deliver
multiview video (MVV) in order to provide free-view TV/video experience. The DVB channel provides a
dedicated platform that can be used for transmitting stereoscopic media in frame-compatible format
wrapped in MPEG-2 transport stream, but is constrained by the physical channel bandwidth to allow
transmitting MVV. The IP platform is more flexible in terms of bandwidth but is not reliable. Server
client or peer-to-peer (P2P) streaming over IP can be used standalone or to supplement the DVB (to
deliver additional views) in order to provide free-view 3-D experience.

8

2. STATE OF THE ART
A. Three-Dimensional Video Formats
To support 3DTV system requirements, many 3D video data representation have been investigated in
terms of their complexity, efficiency, and functionality according to the following general requirements:
can utilize as many existing delivery infrastructure and media as possible,
require minimal change to device components,
backwards compatibility - it is unacceptable for 3D services to impair existing devices,
can support a wide range of display devices and allow for future extension,
are high quality.
Current 3-D video formats can be classified as stereoscopic and multiview as depicted in Fig. 2.
Stereoscopic systems are the most well-known and simple acquisition techniques for 3D video data
representation. Stereoscopic video can provide a 3D impression by using left and right videos as a pair,
thereby creating a stereo camera system, while a monoscopic 2D video cannot. A pair of 2D videos is
acquired: one for the left eye, and the other for the right. As a generalization of stereo video, MVV can
be considered an extension of the stereo video data representation to a higher number of views.
Common stereo video formats are frame-compatible and full resolution (sequential) formats. There are
also depth based representations, which are often preferred for efficient transmission of multiview
video as the number of views increases. Frame-compatible stereo video formats have been developed
to provide 3DTV services over the existing digital TV broadcast infrastructures. They employ pixel
subsampling in order to keep the frame size and rate the same with that of 2-D video.

Fig. 2.Three-dimensional video formats and coding options for fixed-rate and rate-adaptive streaming.
9

Common subsampling patterns include side by side, top and bottom, line interleaved, and
checkerboard. Side-by-side format applies horizontal subsampling to the left and right views, reducing
horizontal resolution by 50%. The subsampled frames are then put together side by side. Likewise, top
and-bottom format vertically subsamples the left and right views, and stitches them overunder. In the
line interleaved format, the left and right views are again subsampled vertically, but put together in an
interleaved fashion. Checkerboard format subsamples left and right views in an offset grid pattern and
multiplexes them into a single frame in a checkerboard layout. Among these formats, side by side and
top and bottom are selected as mandatory for broadcast by the latest HDMI specification 1.4a.

Fig. 3. Multiplexing stereo video
Frame packing, which is the mandatory format for movie and game content in the HDMI specification
version 1.4a, stores frames of left and right views sequentially, without any change in resolution. This
format, which supports full HD stereo video, requires, in the worst case, twice as much bandwidth of
monocular video. The extra bandwidth requirement may be kept around 50% if we use the multiview
video coding (MVC) standard, which is selected by the Blu-Ray Disc Association as the coding format for
3-D video.
Multiview autostereoscopic displays project multiple views into the viewing zone at the same time
essentially, the consecutive views act like stereo pairs (Fig. 4). As a result, head motion parallax viewing
can be supported within practical limits, but the amount of data to be processed and transmitted
increases significantly compared to using conventional stereo data or 2D video. The development of a
wide range of multiview autostereoscopic displays and MVV applications increases the number of
output video needs. Users can therefore choose their own viewpoint (e.g. Super bowl XXXV sport event,
10

bullet time effect, etc). Advanced 3D video applications like this require a 3D video format that can
render a continuum of output views or a very large number of different output views at the decoder
side. MVV formats are still not sufficient to support such requirements without extensively increasing
the number of input views and consequently the bandwidth.

Fig. 4. Efficient support of multiview autostereoscopic displays based on MVV content.
Video-plus-depth data representation has been introduced to overcome this issue. It can respond to the
stereoscopic vision needs at the receiver side as shown in Fig. 5, and at the same time decrease
dramatically the transmission bandwidth compared to the conventional stereo video data
representation. Initially studied in the computer vision field, the video-plus-depth format provides a
regular 2D video enriched with its associated depth video (see Fig. 6).

Fig. 5. Efficient support of stereo autostereoscopic displays based on video-plus-depth content.
11

Fig. 6. Texture picture and its associated depth map.
The 2D video provides the texture information, the color intensity, and the structure of the scene, while
the depth video represents the -distance per-pixel between the optical center of the camera and a 3D
point in the visual scene. Hereafter, the 2D video may be denoted as texture video in opposition to the
depth video.
Great effort has been made to estimate depth information from multiple 2D video inputs. Thanks to
recent advances in semiconductor processes, it is possible to directly capture depth video using a time-
of-flight (TOF) camera, also known as a depth camera. The TOF camera is based on TOF technology that
measures the distance between the camera and the scene in real time. This camera emits infrared light
that is reflected by the environment and then comes back to the cameras sensor. The traveling time of
the light is then measured for each pixel of the sensor and used to compute the depth of the scene. The
depth video can be regarded as a monochromatic texture-less video signal. Generally, the depth data is
quantized with 8 bits, i.e., the closest point is associated with the value 255 and the most distant point is
associated the value 0. With that, the depth video is specified as a smoothed gray level representation.
At the client side, the second color video corresponding to the second view is reconstructed from the
transmitted video-plus-depth data by means of depth image based rendering (DIBR) techniques. The
ability to generate a stereoscopic video from video-plus-depth data at the receiver side is an extended
functionality compared to conventional stereo video data representation. Consequently, the 3D
impression can be adjusted and customized after transmission. However, because view-synthesis-
induced artifacts increase dramatically with the distance of the rendered viewpoint, video-plus-depth
can support only a very limited continuum around the available original view. To overcome this issue,
MPEG started an activity developing a 3D video standard that would support these requirements. This
standard is based on a video-plus-depth (MVD) format as shown in Fig. 7. Video-plus-depth data is
combined with multiview video (MVV) data to form the MVD format, which consists of multiple 2D
videos, each of which has an associated depth video.

12

Fig. 7. Efficient support of multiview autostereoscopic displays based on MVD content.
The final step in this process is rendering multiple intermediate views from the received data by DIBR. At
this point, the central and side views are fully processed and transmitted. As an alternative to fully
transmitting the side views in addition to the central view, LDV can decrease redundancies between the
views by only considering the central view as the main layer and some residual data as enhancement
layers. This new representation can deliver targets high-quality, high-resolution images with lower
bitrates than those deliver by MVD.
We will describe residual layer generation within a three video camera system composed of one central
and two side views (left and right), as illustrated in Fig. 7. The generation process can be separated into
two main parts. First, the central view is transferred to each side view by DIBR using the given depth
video. This process is called as 3D warping.

Fig. 8. 3D image warping: Projection of a 3D point on two image planes in homogeneous coordinates
13

Next, by subtraction, it is possible to determine which parts of the side views are covered in the central
view. These are then assigned as residual data for texture and depth and transmitted while the rest is
omitted. This process includes a function for mapping points from the central view (the reference image
plane) to the side views (the targeted image plane) as illustrated in Fig. 8 and Fig. 9.

Fig. 9. Advanced LDV-based 3DTV concept; Pos: viewpoint, R: right eye, L:left eye, V: view/image, D:
depth, RV: Residual Video Layer, RD: Residual Depth Layer.

B. Three-Dimensional Video Coding
The method of choice for 3-D video encoding should depend on the transport option and raw video
format. For example, for transmission of stereo video over fixed bandwidth broadcast channels, a
nonscalable monocular video codec, such as H.264/AVC, can be used to encode stereo video in one of
frame-compatible formats. Adaptive streaming of stereo and MVV in sequential or multiview-plus-depth
formats have two main options.
1. Simulcast encoding: encode each view and/or depth map independently using a scalable or
nonscalable monocular video codec, which enables streaming each view over separate
channels; and clients can request as many views as their 3-D displays require without worrying
about inter-view dependencies.
14

2. Dependent encoding: encode views using MVC to decrease the overall bit rate by exploiting the
inter-view redundancies. We note that, in this case, special inter-view prediction structures
must be employed to enable view-scalable and view-selective adaptive streaming.
It is also possible to exploit features of the human visual system (HVS) to achieve more efficient
compression by degrading the quality of one of the views without introducing noticeable artifacts. This
approach is known as asymmetric coding. We review some common encoding options for adaptive
streaming of 3-D video in more detail below.
1) Simulcast View Coding Using SVC: Simulcast coding using the SVC standard refers to producing
scalable 3-D video, where each view is encoded independently. Here, two approaches can be
followed for scalability: either all views can be coded scalable, or some views can be coded
scalable using SVC and others can be coded nonscalable using H.264/AVC.

The SVC, which is an annex of the advanced video coding (AVC) standard, provides spatial,
temporal, and quality scalability. SVC provides temporal scalability through the usage of
hierarchical prediction structures, whereas spatial and quality scalability are supported by
multilayer coding. Quality scalability is supported in two modes: coarse-grained scalability (CGS)
and medium grained scalability (MGS). CGS, also called layer-based scalability, is based on the
multilayer concept of SVC, meaning that rate adaptation should be performed on complete
layer basis. However, MGS concept allows any enhancement layer network abstraction layer
(NAL) unit, to be discarded from a quality scalable bit stream in decreasing quality_id order,
providing packet based scalability. Also, it is possible to fragment an MGS layer into multiple
sublayers by grouping zigzag scanned transform coefficients and in this way increase the
number of rate adaptation points.

2) Multiview Extension of H.264/AVC: MVC aims to offer high compression efficiency for MVV by
exploiting interview redundancies. It is based on the high profile of H.264/AVC, and features
hierarchical B-pictures and flexible prediction structures. In one extreme, each frame can be
predicted only from frames of the same view, which is simulcast coding. In another extreme,
frame prediction spans all views, which is called full prediction, at the cost of complex
dependency hierarchy. A simplified prediction scheme is proposed that restricts inter-view
prediction to only anchor pictures, and still achieves similar RD performances. An illustration of
prediction structures for a video with five views is depicted in Fig. 10. In MVC, it is important to
perform proper illumination compensation either by preprocessing or by weighted inter-view
prediction within the coding loop. Also, large disparity or different camera calibration among
views may adversely affect the performance of MVC.
Although there has been some work on scalable MVC, they either utilize a subset of scalability
options or MVC prediction schemes. Current implementation of the reference MVC software
(JMVC 7.0) offers only temporal and view scalability, but no quality or resolution scalability.
Effect of scalability options on subjective quality of MVV is a current research area and the
results are very likely to be dependent on the content and/or 3-D display.
15

Fig. 10.Prediction structures for asymmetric MVC encoding. (a) Full prediction scheme. (b)
Simplified prediction scheme.
3) Multiview-Plus-Depth Coding: In this option, selected views and associated depth maps can be
either simulcast or dependently encoded using nonscalable or scalable codecs. It is also possible
to exploit correlations between the texture video and associated depth maps. For example, SVC
is employed to compress texture videos and associated depth maps jointly, where up to 0.97-dB
gain is achieved for the coded depth maps, compared with the simulcast scheme.

4) Asymmetric Stereoscopic Video Coding: Naturally, stereoscopic video requires higher bit rates
than monocular video. Another method to decrease the overall transmission rate is to exploit
the human visual system, which is known to tolerate lack of high-frequency components in one
of the views. Hence, one of the views may be presented at a lower quality without degrading
the 3-D video perception. This is similar to what is being done with monocular video in which
the chrominance channels can be represented using fewer bits than the luminance, because
human eye is less perceptive to changes in color. In asymmetric MVC coding, where alternating
views are coded at high and low quality, the inter-view dependencies should be carefully
constructed. Fig. 10 depicts a scheme in which the views are predicted only from high-quality
views in order to achieve better prediction.

C. Transport Protocols
Being the de facto reliable transport protocol of the Internet, the Transmission Control Protocol (TCP) is
the first that comes to mind when to send data over IP. But it may be unsuitable to use TCP for
streaming live video with a strict end-to-end delay constraint, due to TCPs lack of control on delay and
16

its rapidly changing transmission rate. On the other hand, TCP is the easiest choice for streaming stored
media, with its built-in congestion control, reliable transmission, and firewall friendliness, making it the
most used transport protocol to stream stored media over the Internet. Popular video distribution sites,
such as YouTube, Vimeo, and Metacafe, use HTTP over TCP to stream video to clients. Moreover, it has
been shown that using TCP for streaming video provides good performance when the available network
bandwidth is about twice the maximum video rate, with a few seconds pre-roll delay.
An alternative to streaming video over TCP is UDP, which does not accommodate TCPs built-in
congestion control and reliable, in order packet delivery, leaving their implementations to the
application layer. Since congestion control is crucial for the stability of the Internet, it should be
implemented by the applications using UDP, which is not a straightforward task. Moreover, UDP is not
firewall friendly, thanks to its connectionless nature. For these reasons, UDP is not as popular as TCP for
streaming video over the Internet, used by media streaming servers such as Windows Media Server. On
the other hand, video conferencing systems such as Skype and Vidyo utilize UDP for media delivery;
however, they base their fail over scenarios on TCP.
The datagram congestion control protocol (DCCP) is a new transport protocol, implementing
bidirectional unicast connections of congestion-controlled, unreliable datagrams, which accommodates
a choice of modular congestion control mechanisms, to be selected at connection startup. DCCP is
designed for applications like streaming media, which does not prefer to use TCP due to arbitrary long
delays that can be introduced by reliable in-order delivery and congestion control, and which does not
like to implement the complex congestion control mechanism that is absent in UDP. It has been shown
that DCCP outperforms TCP under congestion when a video streaming scenario is considered. Moreover,
the performance of streaming video over DCCP in heterogeneous networks is compared with UDP and
the stream control transmission protocol (SCTP); and it is concluded that DCCP achieves better results
than SCTP and UDP. Real-time transport protocol (RTP) is an application layer protocol enabling end-to-
end delivery of media services. RTP defines a packetization format that identifies the payload type,
orders data packets, and provides timestamps to be used in media playout. RTP is typically run on top of
UDP, and may easily be used with DCCP or SCTP, but a framing mechanism is required in case it is used
over TCP. RTP is usually used together with the real-time transport control protocol (RTCP), which
monitors transmission statistics and QoS information. These transport protocols, shown in Fig. 11, can
be easily adopted in 3-D video streaming with little or no change at all.

Fig. 11.Streaming protocol stacks

When a 3-D multicast scenario is considered, the views that compose the video are usually transmitted
over separate multicast channels, so that the clients can subscribe to as many channels as they want,
17

depending on their download capacity or display characteristics. For 3-D unicast, multiplexing all views
on a single connection may utilize the available network better in case a TCP compatible congestion
control scheme is used and the views are encoded at unequal rates. This is because TCP compatible
congestion control schemes tend to divide available bandwidth equally among connections sharing the
same bottleneck link. When unequal rates are allocated to views that are sent over separate network
connections, views demanding lower rates will be overprovisioned while the ones with high bit rate
requirements will not get the network share they need. It should be noted that each video packet
should carry a view identifier, as implemented by the MVC, so the receiver can distinguish the packets of
one view from the other, in case of using a single connection. If multiplexing views on a single
connection to overcome this fairness issue is not an option, then each view with high bit rate may be
split over multiple connections for fairness.
D. Adaptive Streaming
For adaptive streaming, a mechanism should exist to estimate the network conditions so as to adapt the
video rate accordingly, in order to optimize the received video quality. This estimation can be performed
by requesting receiver buffer occupancy status information to prevent buffer underflow/overflow or by
combining the receiver buffer status with bandwidth estimation. A virtual network buffer between the
sender and the receiver is employed together with end-to-end delay constraints to adapt the video
transmitted, while the same virtual network buffer algorithm is also utilized to implement source rate
control and congestion control jointly. Packets may be sent depending on their RD values. In case DCCP
is used, with the TCP friendly rate control (TFRC) congestion control method selected, the TFRC rate
calculated by DCCP can be utilized by the sender to estimate the available network rate. When the video
is streamed over TCP, an average of the transmission rate can be used to determine the available
network bandwidth.
How to adapt the video rate to the available bandwidth depends on the encoding characteristics of the
views. One or more views can be encoded multiple times with varying bit rates, where the sender can
switch between these streams according to the network conditions. Alternatively, in HTTP live
streaming, the client selects from a number of streams containing the same material encoded at a
variety of data rates in order to adapt to the available network rate. A more elegant solution is encoding
views once with multiple layers using SVC and switching between these layers. Another video
adaptation scheme is real-time encoding with source rate control. Even SVC encoding can be performed
in real time. However, real-time encoding of MVV is difficult due to high computational requirements as
the number of views grows.
E. P2P Streaming
The serverclient unicast streaming model is not scalable by its nature, that is, it is difficult to serve an
increasing number of clients without expanding the bandwidth capacity or creating a large content
distribution network (CDN). The most important advantage of P2P solutions over traditional server
client architecture is scalable media distribution. These solutions aim to reduce the bandwidth
requirement of the server by utilizing the network capacity of the clients, now called peers.
18

In theory, it is possible to originate only a single copy from a source and duplicate the packets along the
path to different peers at the network layer. This could have been the best solution for the scalability
problem but unfortunately multicasting at the network layer has not been implemented. Current P2P
solutions use overlay networks in which the data are redirected to another peer by the application at
the edge platforms and multiple copies of the data traverse the IP network. It is evident that relying on
peers that may leave the network or stop data transmission anytime has its drawbacks but there are
already successful P2P video applications that have managed to solve such issues. It is possible to
examine these solutions under two extremes: tree-based (structured) and mesh-based (unstructured)
solutions.

19

3. CONCLUSIONS

We review three adaptive streaming solutions for distribution of 3-D media. The first,
asymmetric streaming can be utilized for displays with limited number of views, such as five views or
less. We note that the visual experiments on asymmetric coding have been conducted so far on short
video clips, and some experts claim viewing asymmetric-coded stereo video over longer periods may
cause eye fatigue, which needs to be studied further. If this is the case, asymmetric streaming only
during periods of congestion may be more desirable. The second and third methods, streaming using
MVD and selective streaming, respectively, are intended for displays that support more views, such as
5200 views. Selective streaming requires tracking a viewers head position; hence it is applicable in
case of a single user with a head tracking 3-D display. Thus, adaptive streaming using the MVD
representation seems to be better suited for general purpose multiview video applications with more
than five views. Broadcast of stereoscopic 3-D media over digital TV platforms has already started.
However, these platforms cannot provide sufficient bandwidth to broadcast multiview video due to
physical channel limitations. Hence, we foresee that, in the medium term, multiview video services will
be developed using the second method and these services will be deployed over the IP platform using
various architectures, including serverclient and P2P.

20

4. REFERENCES

C. Gktug Grler and Murat Tekalp, Ko University, Peer-to-peer system design for adaptive 3D
video streaming, IEEE Commun. Mag., vol. 51, no. 5, 2013, pp. 108-114.
Grler, C.G. ; Dept. of Comput. Eng., Koc Univ., Istanbul, Turkey ; Grkemli, . ; Saygili,
G. ; Tekalp, A.M.,Flexible Transport of 3-D Video Over Networks, Proceedings of the IEEE, vol.
99, no. 4, 2011, pp. 694-707.
Ismal Daribo and Hideo Saito, A Novel Inpainting-ased Layered Depth Video for 3DTV, IEEE
Transactions on Broadcasting, vol. 57, no. 2, June 2011
K. Muller, A. Smolic, K. Dix, P. Kauff, and T. Wiegand, Reliabilitybased generation and view
synthesis in layered depth video, inProc. IEEE Workshop Multimedia Signal Process. (MMSP),
Cairns, Queensland, Australia, Oct. 2008, pp. 3439.

3D Video Streaming

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

3D Video Streaming

Încărcat de

Drepturi de autor:

Formate disponibile

1

S-ar putea să vă placă și