Sunteți pe pagina 1din 6

[3B2-9] mmu2011040062.

3d 27/10/011 13:8 Page 62

Anthony Vetro
Industry and Standards Mitsubishi Electric Research Labs

The MPEG-DASH Standard


for Multimedia Streaming
Over the Internet
Iraj Sodagar HTTP streaming
Microsoft
Corporation
W atching the Olympics live over the Inter-
net? Streaming last week’s episode of
your favorite TV show to your game console?
Delivering video content over the Internet
started in the 1990s with timely delivery and
Watching a 24-hour news TV channel on consumption of large amounts of data being
your mobile phone? These use cases might the main challenge. The Internet Engineering
already seem possible as part of our daily Task Force’s Real-Time Transport Protocol
lives. In fact, during the 2008 Olympics, NBC (RTP) was designed to define packet formats
reported delivering 3.4 petabytes of video con- for audio and video content along with
tent over the Internet.1 The truth is, however, stream-session management, which allowed
that multimedia streaming over the Internet delivery of those packets with low overhead.
is still in its infancy compared to its potential RTP works well in managed IP networks. How-
market. One reason is that today every com- ever, in today’s Internet, managed networks
mercial platform is a closed system with its have been replaced by content delivery net-
own manifest format, content formats, and works (CDN), many of which don’t support
streaming protocols. In other words, no intero- RTP streaming. In addition, RTP packets are
perability exists between devices and servers of often not allowed through firewalls. Finally,
various vendors. A recent study indicated that RTP streaming requires the server to manage a
in a few years video content could make up separate streaming session for each client, mak-
the vast majority of Internet traffic.2 One of ing large-scale deployments resource intensive.
the main enablers of this would be an adopted With the increase of Internet bandwidth and
standard that provides interoperability between the tremendous growth of the World Wide
various servers and devices. Achieving such Web, the value of delivering audio or video
interoperability will be instrumental for market data in small packets has diminished. Multime-
growth, because a common ecosystem of con- dia content can now be delivered efficiently in
tent and services will be able to provision a larger segments using HTTP. HTTP streaming
broad range of devices, such as PCs, TVs, lap- has several benefits. First, the Internet infra-
tops, set-top boxes, game consoles, tablets, structure has evolved to efficiently support
and mobiles phones. MPEG-Dynamic Adaptive HTTP. For instance, CDNs provide localized
Streaming (DASH) was developed to do just edge caches, which reduce long-haul traffic.
that. Also, HTTP is firewall friendly because almost
all firewalls are configured to support its outgo-
ing connections. HTTP server technology is a
Editor’s Note commodity and therefore supporting HTTP
MPEG has recently finalized a new standard to enable dynamic and streaming for millions of users is cost effective.
adaptive streaming of media over HTTP. This standard aims to address Second, with HTTP streaming the client man-
the interoperability needs between devices and servers of various ven- ages the streaming without having to maintain
dors. There is broad industry support for this new standard, which a session state on the server. Therefore, provi-
offers the promise of transforming the media-streaming landscape. sioning a large number of streaming clients
—Anthony Vetro doesn’t impose any additional cost on server
resources beyond standard Web use of HTTP,

62 
1070-986X/11/$26.00 c 2011 IEEE Published by the IEEE Computer Society
[3B2-9] mmu2011040062.3d 27/10/011 13:8 Page 63

Period 1 Period 2 Period 3 Figure 1. Simple


1 6 example of dynamic
5 Mbps
adaptive streaming.
4
2 Mbps Numbered circles
Video 2
demonstrate the
0.5 Mbps 3 action points taken
Trick by the device.
mode 5

Surround
Audio 1 128-kbps 1
English AAC
48-kbps
AAC 3
5
128-kbps
Audio 2 AAC 6
French 48-kbps
AAC

Time

and can be managed by a CDN using standard at the time of publishing this article, only the
HTTP optimization techniques. referenced draft is publically available. The spec-
For all of these reasons, HTTP streaming has ification was further revised in August 2011 and
become a popular approach in commercial is expected to be published as ISO/IEC 23009-1.
deployments. For instance, streaming platforms
such as Apple’s HTTP Live Streaming,3 Micro- A simple case of adaptive streaming
soft’s Smooth Streaming,4 and Adobe’s HTTP Figure 1 illustrates a simple example of on-
Dynamic Streaming (see http://www.adobe. demand, dynamic, adaptive streaming. In this
com/products/httpdynamicstreaming) all use figure, the multimedia content consists of
HTTP streaming as their underlying delivery video and audio components. The video source
method. However, each implementation uses is encoded at three different alternative bitrates:
different manifest and segment formats and 5 Mbytes, 2 Mbytes, and 500 kilobits per sec-
therefore, to receive the content from each ond (Kbps). Additionally, an I-frame-only bit-
server, a device must support its correspond- stream with a low frame rate is provided for
ing proprietary client protocol. A standard for streaming during the trick-mode play. The
HTTP streaming of multimedia content would accompanying audio content is available in
allow a standard-based client to stream content two languages: audio 1 is a dubbed English ver-
from any standard-based server, thereby en- sion of the audio track and is encoded in sur-
abling interoperability between servers and cli- round sound, Advanced Audio Coding (AAC)
ents of different vendors. with 128-Kbyte and 48-Kbps alternatives;
Observing the market prospects and requests while audio 2 is the original French version,
from the industry, MPEG issued a Call for Pro- encoded in AAC 128-Kbyte and 48-Kbps alter-
posal for an HTTP streaming standard in April natives only.
October—December 2011

2009. Fifteen full proposals were received by Assume that a device starts streaming the
July 2009, when MPEG started the evaluation content by requesting segments of the video
of the submitted technologies. In the two years bitstream at the highest available quality
that followed, MPEG developed the specifica- (5 Mbytes) and the English audio at 128 Kbytes
tion with participation from many experts and AAC because, for instance, the device doesn’t
with collaboration from other standard groups, support surround audio (label 1 in Figure 1).
such as the Third Generation Partnership Project After streaming the first segments of video
(3GPP)0.5 The resulting standard, known as and audio, and monitoring the effective net-
MPEG-DASH over HTTP, is currently at the work bandwidth, the device realizes that
Draft International Standard stage.6 Note that, the actual available bandwidth is lower than

63
[3B2-9] mmu2011040062.3d 27/10/011 13:8 Page 64

Industry and Standards

HTTP server DASH client delivered using HTTP. The content exists on
Media
the server in two parts: Media Presentation De-
Segment Presentation Control heuristics scription (MPD), which describes a manifest of
Description
Segment
Segment (MPD) the available content, its various alternatives,
Segment
Segment their URL addresses, and other characteristics;
Segment MPD delivery
Segment
Segment
MPD parser and segments, which contain the actual multi-
Segment
Media media bitstreams in the form of chunks, in
Segment player single or multiple files.
Segment parser To play the content, the DASH client first
Segment
obtains the MPD. The MPD can be delivered
Segment HTTP l.l HTTP client using HTTP, email, thumb drive, broadcast, or
other transports. By parsing the MPD, the
DASH client learns about the program timing,
Figure 2. Scope of the 5 Mbps. So, at the next available switching media-content availability, media types, resolu-
MPEG-DASH standard. point, it switches the video down to 2 Mbps tions, minimum and maximum bandwidths,
The formats and by streaming the next segments from the and the existence of various encoded alterna-
functionalities of the mid-quality track while continuing streaming tives of multimedia components, accessibility
red blocks are defined of the 128-Kbyte AAC English audio (label 2 features and required digital rights manage-
by the specification. in Figure 1). The device continues to monitor ment (DRM), media-component locations on
The clients control the actual network bandwidth and realizes the network, and other content characteristics.
heuristics and media that the network bandwidth has further Using this information, the DASH client selects
players, which aren’t decreased to a value lower than 2 Mbps. There- the appropriate encoded alternative and starts
within the standard’s fore, to maintain continuous playback, the de- streaming the content by fetching the seg-
scope. vice further switches the streams down to ments using HTTP GET requests.
500-Kbps video and 48-Kbps audio (label 3 in After appropriate buffering to allow for net-
Figure 1). It continues playing the content at work throughput variations, the client contin-
these rates until the network bandwidth ues fetching the subsequent segments and also
increases and then it switches the video up to monitors the network bandwidth fluctuations.
2 Mbytes (label 4 in Figure 1). After a while, Depending on its measurements, the client
the user decides to pause and rewind. At this decides how to adapt to the available band-
point, the device starts streaming the video width by fetching segments of different alterna-
from the trick-mode track to play the video in tives (with lower or higher bitrates) to maintain
reverse order, while audio is muted (label 5 in an adequate buffer.
Figure 1). At the desired point, the user clicks The MPEG-DASH specification only defines
to play the content with the original French the MPD and the segment formats. The deliv-
audio. At this point, the device resumes stream- ery of the MPD and the media-encoding for-
ing the video from the highest quality (5 Mbytes) mats containing the segments, as well as the
and audio from 128-Kbyte French audio (label 6 client behavior for fetching, adaptation heuris-
in Figure 1). tics, and playing content, are outside of MPEG-
This example perhaps is one of the most DASH’s scope.
simple use cases of dynamic streaming of mul-
timedia content. More advanced use cases Multimedia Presentation Description
might include switching between multiple Dynamic HTTP streaming requires various
camera views, 3D multimedia content stream- bitrate alternatives of the multimedia content
ing, video streams with subtitles and captions, to be available at the server. In addition, the
dynamic ad insertion, low-latency live stream- multimedia content might consist of several
ing, mixed-streaming and prestored content media components (for example, audio, video,
playback, and others. and text), each of which might have different
IEEE MultiMedia

characteristics. In MPEG-DASH, these character-


Scope of MPEG-DASH istics are described by MPD, which is an XML
Figure 2 illustrates a simple streaming sce- document.
nario between an HTTP server and a DASH cli- Figure 3 demonstrates the MPD hierarchical
ent. In this figure, the multimedia content is data model. The MPD consists of one or multi-
captured and stored on an HTTP server and is ple periods, where a period is a program

64
[3B2-9] mmu2011040062.3d 27/10/011 13:8 Page 65

Segment info Figure 3. The


duration = 60 seconds Multimedia
Media Presentation
Description Initialization Presentation
Period ID = 2 segment
start = 60 seconds Description
Period ID = 1 Adaptation set 1 http://ex.com/il.mp4
start = 0 seconds hierarchical data
Adaptation Representation 1 Media segment 1
... Representation 2 model. In this example,
set 0 5 Mbps start = 0 seconds
2 Mbytes MPD contains three
http://ex.com/v1.mp4
Period ID = 2 Representation 2
2 Mbps periods, period 2
start = 60 seconds Adaptation Media segment 2
... set 1 Segment info start = 15 seconds contains three
Representation 3
http://ex.com/v2.mp4 adaptation sets and the
500 kbps
Period ID = 3 Adaptation Media segment 3 adaptation set 1
Representation 4
start = 120 seconds set 2 start = 30 seconds contains four
trick mode
... http://ex.com/v3.mp4
representations,
Media segment 4 including three
start = 45 seconds
representations with
http://ex.com/v4.mp4
various bit rates and
one representation for
interval along the temporal axis. Each period start time and end time, approximate media trick mode. Finally,
has a starting time and duration and consists start time, and the fixed or variable duration representation
of one or multiple adaptation sets. An adapta- of segments. 2 consists of its
tion set provides the information about one segment info, which
consequently includes
or multiple media components and its various Segment format
its initialization
encoded alternatives. For instance, an adapta- The multimedia content can be accessed as a
segment and four
tion set might contain the different bitrates of collection of segments. A segment is defined as
media segments’
the video component of the same multimedia the entity body of the response to the DASH cli-
information.
content. Another adaptation set might contain ent’s HTTP GET or a partial HTTP GET. A media
the different bitrates of the audio component component is encoded and divided in multiple
(for example, lower-quality stereo and higher- segments. The first segment might be an initial-
quality surround sound) of the same multi- ization segment containing the required infor-
media content. Each adaptation set usually mation for initialization of the DASH client’s
includes multiple representations. media decoder. It doesn’t include any actual
A representation is an encoded alternative media data.
of the same media component, varying from The media stream then is divided to one or
other representations by bitrate, resolution, multiple consecutive media segments. Each
number of channels, or other characteristics. media segment is assigned a unique URL (possi-
Each representation consists of one or multiple bly with byte range), an index, and explicit or
segments. Segments are the media stream implicit start time and duration. Each media
chunks in temporal sequence. Each segment segment contains at least one stream access
has a URI—that is, an addressable location on point, which is a random access or switch-to
a server that can be downloaded using HTTP point in the media stream where decoding can
GET or HTTP GET with byte ranges. start using only data from that point forward.
To use this data model, the DASH client first To enable downloading segments in multi-
parses the MPD XML document. The client ple parts, the specification defines a method
then selects the set of representations it will of signaling subsegments using a segment
October—December 2011

use based on descriptive elements in the MPD, index box.7 This box describes subsegments
the client’s capabilities, and user’s choices. The and stream access points in the segment by sig-
client then builds a timeline and starts playing naling their durations and byte offsets. The
the multimedia content by requesting appro- DASH client can use the indexing information
priate media segments. Each representation’s to request subsegments using partial HTTP
description includes information about its seg- GETS. The indexing information of a segment
ments, which enables requests for each seg- can be put in the single box at the beginning
ment to be formulated in terms of the HTTP of that segment, or spread among many index-
URL and byte range. For live presentations, ing boxes in the segment. Different methods of
the MPD also provides segment availability spreading are possible, such as hierarchical,

65
[3B2-9] mmu2011040062.3d 27/10/011 13:8 Page 66

Industry and Standards

daisy chain, and hybrid. This technique avoids  Fragmented manifest. The MPD can be div-
adding a large box at the beginning of the seg- ided into multiple parts or some of its
ment and therefore prevents a possible initial elements can be externally referenced,
download delay. enabling downloading MPD in multiple
MPEG-DASH defines segment-container for- steps.
mats for both ISO Base Media File Format8 and
MPEG-2 Transport Streams.9 MPEG-DASH is  Segments with variable durations. The duration
media codec agnostic and supports both multi- of segments can be varied. With live stream-
plexed and unmultiplexed encoded content. ing, the duration of the next segment can
also be signaled with the delivery of the cur-
rent segment.
Multiple DRM and common encryption
In MPEG-DASH, each adaptive set can use
one content-protection descriptor to describe
 Multiple base URLs. The same content can be
the supported DRM scheme. An adaptive set available at multiple URLs—that is, at differ-
can also use multiple content-protection ent servers or CDNs—and the client can
schemes and as long as the client recognizes at stream from any of them to maximize the
least one, it can stream and decode the content. available network bandwidth.
In conjunction with the MPEG-DASH stan-
dardization, MPEG is also developing a com-
 Clock-drift control for live sessions. The UTC
mon encryption standard, ISO/IEC 23001-7, time can be included with each segment to
which defines signaling of a common encryp- enable the client to control its clock drift.
tion scheme of media content. Using this stan-
dard, the content can be encrypted once and
 Scalable Video Coding (SVC) and Multiview
streamed to clients, which support different Video Coding (MVC) support. The MPD pro-
DRM license systems. Each client gets the vides adequate information regarding the
decryption keys and other required informa- decoding dependencies between represen-
tion using its particular supported DRM system, tations, which can be used for streaming
which is signaled in the MPD, and then streams any multilayer coded streams such as SVC
the commonly encrypted content from the and MVC.
same server.
 A flexible set of descriptors. These describe
content rating, components’ roles, accessi-
Additional features
bility features, camera views, frame packing,
The MPEG-DASH specification is a feature-
and audio channels’ configuration.
rich standard. Some of the additional features
include:
 Subsetting adaptation sets into groups. Group-
 Switching and selectable streams. The MPD ing occurs according to the content author’s
guidance.
provides adequate information to the client
for selecting and switching between streams,
for example, selecting one audio stream
 Quality metrics for reporting the session experi-
from different languages, selecting video ence. The standard has a set of well-defined
between different camera angles, selecting quality metrics for the client to measure
the subtitles from provided languages, and and report back to a reporting server.
dynamically switching between different
Most of these features are provided in flexi-
bitrates of the same video camera.
ble and extensible ways enabling the possibility
 Ad insertion. Advertisements can be inserted of deploying MPEG-DASH for unforeseeable use
as a period between periods or segment be- cases in the future.
IEEE MultiMedia

tween segments in both on-demand and


live cases. What’s next?
The specification defines five specific profiles,
 Compact manifest. The segments’ address each addressing a different class of applications.
URLs can be signaled using a template Each profile defines a set of constraints, limit-
scheme resulting in a compact MPD. ing the MPD and segment formats to a subset

66
[3B2-9] mmu2011040062.3d 27/10/011 13:8 Page 67

of the entire specification. Therefore, a DASH 8. Information Technology—Coding of Audio-Visual


client conforming to a specific profile is only Objects—Part 12: ISO Base Media File Format,
required to support those required features ISO/IEC 14496-12, 2008.
and not the entire specification. Some profiles 9. ITU-T Rec. H.222.0|ISO/IEC 13818-1, Information
are specifically designed to use the legacy con- Technology—Generic Coding of Moving Pictures
tent and therefore provide a migration path and Associated Audio Information: Systems, ITU-T/
for the existing nonstandard solutions to a ISO/IEC, 2007; http://www.iso.org/iso/iso_
standard one. catalogue/catalogue_tc/catalogue_detail.
Several other standard organizations and htm?csnumber=44169.
consortia are collaborating with MPEG to refer-
ence MPEG-DASH in their own specifications. Contact author Iraj Sodagar at irajs@
At the same time, it seems that industry is mov- microsoft.com.
ing quickly to provide solutions based on
MPEG-DASH. Some open source implementa- Contact editor Anthony Vetro at avetro@
tions are also on the way. It’s believed that merl.com.
the next two years will be a crucial time for
the industry—including content and service
providers, platform providers, software vendors,
CDN providers, and device manufacturers—to
adopt this standard and build an interoperable
ecosystem for multimedia streaming over the
Internet. MM

References
1. T. Sneath, MIX09 Day 1 Keynote Pt 2: Scott
Guthrie on Advancing User Experiences, blog, 18
Mar. 2009; http://blogs.msdn.com/b/tims/
archive/2009/03/18/mix09-day-1-keynote-pt-2-
scott-guthrie-on-advancing-user-experiences.aspx.
2. Cisco Networks, Cisco’s Visual Networking Index
Global IP Traffic Forecast 2010-2015, tech. report,
June 2011; http://www.cisco.com/en/US/netsol/
ns827/networking_solutions_sub_solution.
html#~forecast.
3. R. Pantos and E.W. May, ‘‘HTTP Live Streaming,’’
IETF Internet draft, work in progress, Mar. 2011.
4. Microsoft, IIS Smooth Streaming Transport Proto-
col, Sept. 2009; http://www.iis.net/community/
files/media/smoothspecs/[MS-SMTH].pdf.
5. T. Stockhammer, TS 26.247 Transparent End-to-
End Packet-Switched Streaming Service (PSS);
Progressive Download and Dynamic Adaptive
Streaming over HTTP, 3GPP, June 2011; http://
www.3gpp.org/ftp/Specs/html-info/26247.htm.
6. ISO/IEC FCD 23001-6, Part 6: Dynamic Adaptive
Streaming Over HTTP (DASH), MPEG Requirements
Group, Jan. 2011; http://mpeg.chiariglione.org/
working_documents/mpeg-b/dash/dash-dis.zip.
7. ISO/IEC 14496-12:2008/DAM 3, Information
Technology—Coding Of Audio-Visual Objects—
Part 12: ISO Base Media File Format—Amendment
3: DASH Support and RTP Reception Hint Track
Processing, Jan. 2011.

67

S-ar putea să vă placă și