Documente Academic
Documente Profesional
Documente Cultură
Anthony Vetro
Industry and Standards Mitsubishi Electric Research Labs
62
1070-986X/11/$26.00 c 2011 IEEE Published by the IEEE Computer Society
[3B2-9] mmu2011040062.3d 27/10/011 13:8 Page 63
Surround
Audio 1 128-kbps 1
English AAC
48-kbps
AAC 3
5
128-kbps
Audio 2 AAC 6
French 48-kbps
AAC
Time
and can be managed by a CDN using standard at the time of publishing this article, only the
HTTP optimization techniques. referenced draft is publically available. The spec-
For all of these reasons, HTTP streaming has ification was further revised in August 2011 and
become a popular approach in commercial is expected to be published as ISO/IEC 23009-1.
deployments. For instance, streaming platforms
such as Apple’s HTTP Live Streaming,3 Micro- A simple case of adaptive streaming
soft’s Smooth Streaming,4 and Adobe’s HTTP Figure 1 illustrates a simple example of on-
Dynamic Streaming (see http://www.adobe. demand, dynamic, adaptive streaming. In this
com/products/httpdynamicstreaming) all use figure, the multimedia content consists of
HTTP streaming as their underlying delivery video and audio components. The video source
method. However, each implementation uses is encoded at three different alternative bitrates:
different manifest and segment formats and 5 Mbytes, 2 Mbytes, and 500 kilobits per sec-
therefore, to receive the content from each ond (Kbps). Additionally, an I-frame-only bit-
server, a device must support its correspond- stream with a low frame rate is provided for
ing proprietary client protocol. A standard for streaming during the trick-mode play. The
HTTP streaming of multimedia content would accompanying audio content is available in
allow a standard-based client to stream content two languages: audio 1 is a dubbed English ver-
from any standard-based server, thereby en- sion of the audio track and is encoded in sur-
abling interoperability between servers and cli- round sound, Advanced Audio Coding (AAC)
ents of different vendors. with 128-Kbyte and 48-Kbps alternatives;
Observing the market prospects and requests while audio 2 is the original French version,
from the industry, MPEG issued a Call for Pro- encoded in AAC 128-Kbyte and 48-Kbps alter-
posal for an HTTP streaming standard in April natives only.
OctoberDecember 2011
2009. Fifteen full proposals were received by Assume that a device starts streaming the
July 2009, when MPEG started the evaluation content by requesting segments of the video
of the submitted technologies. In the two years bitstream at the highest available quality
that followed, MPEG developed the specifica- (5 Mbytes) and the English audio at 128 Kbytes
tion with participation from many experts and AAC because, for instance, the device doesn’t
with collaboration from other standard groups, support surround audio (label 1 in Figure 1).
such as the Third Generation Partnership Project After streaming the first segments of video
(3GPP)0.5 The resulting standard, known as and audio, and monitoring the effective net-
MPEG-DASH over HTTP, is currently at the work bandwidth, the device realizes that
Draft International Standard stage.6 Note that, the actual available bandwidth is lower than
63
[3B2-9] mmu2011040062.3d 27/10/011 13:8 Page 64
HTTP server DASH client delivered using HTTP. The content exists on
Media
the server in two parts: Media Presentation De-
Segment Presentation Control heuristics scription (MPD), which describes a manifest of
Description
Segment
Segment (MPD) the available content, its various alternatives,
Segment
Segment their URL addresses, and other characteristics;
Segment MPD delivery
Segment
Segment
MPD parser and segments, which contain the actual multi-
Segment
Media media bitstreams in the form of chunks, in
Segment player single or multiple files.
Segment parser To play the content, the DASH client first
Segment
obtains the MPD. The MPD can be delivered
Segment HTTP l.l HTTP client using HTTP, email, thumb drive, broadcast, or
other transports. By parsing the MPD, the
DASH client learns about the program timing,
Figure 2. Scope of the 5 Mbps. So, at the next available switching media-content availability, media types, resolu-
MPEG-DASH standard. point, it switches the video down to 2 Mbps tions, minimum and maximum bandwidths,
The formats and by streaming the next segments from the and the existence of various encoded alterna-
functionalities of the mid-quality track while continuing streaming tives of multimedia components, accessibility
red blocks are defined of the 128-Kbyte AAC English audio (label 2 features and required digital rights manage-
by the specification. in Figure 1). The device continues to monitor ment (DRM), media-component locations on
The clients control the actual network bandwidth and realizes the network, and other content characteristics.
heuristics and media that the network bandwidth has further Using this information, the DASH client selects
players, which aren’t decreased to a value lower than 2 Mbps. There- the appropriate encoded alternative and starts
within the standard’s fore, to maintain continuous playback, the de- streaming the content by fetching the seg-
scope. vice further switches the streams down to ments using HTTP GET requests.
500-Kbps video and 48-Kbps audio (label 3 in After appropriate buffering to allow for net-
Figure 1). It continues playing the content at work throughput variations, the client contin-
these rates until the network bandwidth ues fetching the subsequent segments and also
increases and then it switches the video up to monitors the network bandwidth fluctuations.
2 Mbytes (label 4 in Figure 1). After a while, Depending on its measurements, the client
the user decides to pause and rewind. At this decides how to adapt to the available band-
point, the device starts streaming the video width by fetching segments of different alterna-
from the trick-mode track to play the video in tives (with lower or higher bitrates) to maintain
reverse order, while audio is muted (label 5 in an adequate buffer.
Figure 1). At the desired point, the user clicks The MPEG-DASH specification only defines
to play the content with the original French the MPD and the segment formats. The deliv-
audio. At this point, the device resumes stream- ery of the MPD and the media-encoding for-
ing the video from the highest quality (5 Mbytes) mats containing the segments, as well as the
and audio from 128-Kbyte French audio (label 6 client behavior for fetching, adaptation heuris-
in Figure 1). tics, and playing content, are outside of MPEG-
This example perhaps is one of the most DASH’s scope.
simple use cases of dynamic streaming of mul-
timedia content. More advanced use cases Multimedia Presentation Description
might include switching between multiple Dynamic HTTP streaming requires various
camera views, 3D multimedia content stream- bitrate alternatives of the multimedia content
ing, video streams with subtitles and captions, to be available at the server. In addition, the
dynamic ad insertion, low-latency live stream- multimedia content might consist of several
ing, mixed-streaming and prestored content media components (for example, audio, video,
playback, and others. and text), each of which might have different
IEEE MultiMedia
64
[3B2-9] mmu2011040062.3d 27/10/011 13:8 Page 65
use based on descriptive elements in the MPD, index box.7 This box describes subsegments
the client’s capabilities, and user’s choices. The and stream access points in the segment by sig-
client then builds a timeline and starts playing naling their durations and byte offsets. The
the multimedia content by requesting appro- DASH client can use the indexing information
priate media segments. Each representation’s to request subsegments using partial HTTP
description includes information about its seg- GETS. The indexing information of a segment
ments, which enables requests for each seg- can be put in the single box at the beginning
ment to be formulated in terms of the HTTP of that segment, or spread among many index-
URL and byte range. For live presentations, ing boxes in the segment. Different methods of
the MPD also provides segment availability spreading are possible, such as hierarchical,
65
[3B2-9] mmu2011040062.3d 27/10/011 13:8 Page 66
daisy chain, and hybrid. This technique avoids Fragmented manifest. The MPD can be div-
adding a large box at the beginning of the seg- ided into multiple parts or some of its
ment and therefore prevents a possible initial elements can be externally referenced,
download delay. enabling downloading MPD in multiple
MPEG-DASH defines segment-container for- steps.
mats for both ISO Base Media File Format8 and
MPEG-2 Transport Streams.9 MPEG-DASH is Segments with variable durations. The duration
media codec agnostic and supports both multi- of segments can be varied. With live stream-
plexed and unmultiplexed encoded content. ing, the duration of the next segment can
also be signaled with the delivery of the cur-
rent segment.
Multiple DRM and common encryption
In MPEG-DASH, each adaptive set can use
one content-protection descriptor to describe
Multiple base URLs. The same content can be
the supported DRM scheme. An adaptive set available at multiple URLs—that is, at differ-
can also use multiple content-protection ent servers or CDNs—and the client can
schemes and as long as the client recognizes at stream from any of them to maximize the
least one, it can stream and decode the content. available network bandwidth.
In conjunction with the MPEG-DASH stan-
dardization, MPEG is also developing a com-
Clock-drift control for live sessions. The UTC
mon encryption standard, ISO/IEC 23001-7, time can be included with each segment to
which defines signaling of a common encryp- enable the client to control its clock drift.
tion scheme of media content. Using this stan-
dard, the content can be encrypted once and
Scalable Video Coding (SVC) and Multiview
streamed to clients, which support different Video Coding (MVC) support. The MPD pro-
DRM license systems. Each client gets the vides adequate information regarding the
decryption keys and other required informa- decoding dependencies between represen-
tion using its particular supported DRM system, tations, which can be used for streaming
which is signaled in the MPD, and then streams any multilayer coded streams such as SVC
the commonly encrypted content from the and MVC.
same server.
A flexible set of descriptors. These describe
content rating, components’ roles, accessi-
Additional features
bility features, camera views, frame packing,
The MPEG-DASH specification is a feature-
and audio channels’ configuration.
rich standard. Some of the additional features
include:
Subsetting adaptation sets into groups. Group-
Switching and selectable streams. The MPD ing occurs according to the content author’s
guidance.
provides adequate information to the client
for selecting and switching between streams,
for example, selecting one audio stream
Quality metrics for reporting the session experi-
from different languages, selecting video ence. The standard has a set of well-defined
between different camera angles, selecting quality metrics for the client to measure
the subtitles from provided languages, and and report back to a reporting server.
dynamically switching between different
Most of these features are provided in flexi-
bitrates of the same video camera.
ble and extensible ways enabling the possibility
Ad insertion. Advertisements can be inserted of deploying MPEG-DASH for unforeseeable use
as a period between periods or segment be- cases in the future.
IEEE MultiMedia
66
[3B2-9] mmu2011040062.3d 27/10/011 13:8 Page 67
References
1. T. Sneath, MIX09 Day 1 Keynote Pt 2: Scott
Guthrie on Advancing User Experiences, blog, 18
Mar. 2009; http://blogs.msdn.com/b/tims/
archive/2009/03/18/mix09-day-1-keynote-pt-2-
scott-guthrie-on-advancing-user-experiences.aspx.
2. Cisco Networks, Cisco’s Visual Networking Index
Global IP Traffic Forecast 2010-2015, tech. report,
June 2011; http://www.cisco.com/en/US/netsol/
ns827/networking_solutions_sub_solution.
html#~forecast.
3. R. Pantos and E.W. May, ‘‘HTTP Live Streaming,’’
IETF Internet draft, work in progress, Mar. 2011.
4. Microsoft, IIS Smooth Streaming Transport Proto-
col, Sept. 2009; http://www.iis.net/community/
files/media/smoothspecs/[MS-SMTH].pdf.
5. T. Stockhammer, TS 26.247 Transparent End-to-
End Packet-Switched Streaming Service (PSS);
Progressive Download and Dynamic Adaptive
Streaming over HTTP, 3GPP, June 2011; http://
www.3gpp.org/ftp/Specs/html-info/26247.htm.
6. ISO/IEC FCD 23001-6, Part 6: Dynamic Adaptive
Streaming Over HTTP (DASH), MPEG Requirements
Group, Jan. 2011; http://mpeg.chiariglione.org/
working_documents/mpeg-b/dash/dash-dis.zip.
7. ISO/IEC 14496-12:2008/DAM 3, Information
Technology—Coding Of Audio-Visual Objects—
Part 12: ISO Base Media File Format—Amendment
3: DASH Support and RTP Reception Hint Track
Processing, Jan. 2011.
67