Documente Academic
Documente Profesional
Documente Cultură
1. INTRODUCTION .................................................................................................. 4
Startup point....................................................................................................................... 5
Network streaming............................................................................................................. 6
2. TECHNOLOGIES ................................................................................................. 7
MPEG.................................................................................................................................. 7
RTP.................................................................................................................................... 22
About realtime networking ............................................................................................ 22
Multimedia over Internet and other TCP/IP networks ................................................... 22
Some solutions ............................................................................................................... 23
RTP - Realtime Transport Protocol ............................................................................... 24
Development .................................................................................................................. 24
RTP operation ................................................................................................................ 24
RTP fixed header fields ................................................................................................. 25
RTP features................................................................................................................... 26
2
An MPEG archival and live streaming system over RTP/UDP networks
The Approach................................................................................................................. 31
Introduction ...................................................................................................................... 48
7. SOURCES .......................................................................................................... 62
8. ANNEX ............................................................................................................... 63
3
An MPEG archival and live streaming system over RTP/UDP networks
1. Introduction
There has recently been a flood of interest in the delivery of multimedia services
over digital networks, in particular the Internet. The growing popularity of Internet
telephony, streaming audio and video services (such as those provided by Real Audio
from Real Networks , Windows Media from Microsoft or QuickTime from Apple) are all
indicators of this trend.
In the advent of the digital era breakthroughs in video compression and network
protocols and technologies have contributed to make the term ‘streaming’ acquire new
significance. Streaming, in the digital world denotes de action of transmitting multimedia
content over any network, in particular audio and video. The flow of digital data is
actually streamed through the network following different patterns multicast ( one
transmitter multiple receivers ), unicast ( one transmitter one receiver ).
On the other side MPEG 1 is the unquestionable standard for digital video
compression. It’s usually accepted that no standard before has had such a wide acceptance
and quick deployment and MPEG still has a long way to go. Even when the beginning of
standard is set at the beginning of the 90’s and multiple applications are already using it
(digital satellite broadcasts, Digital Video Disks (DVD), digital video cameras …), new
technologies in other areas pose new uses for MPEG.
In particular the high speed at what that digital networks are working these days
make it possible to transmit MPEG content over those networks enabling a whole new
range of applications like personal teleconferencing, surveillance, distance learning
between many others.
This project, developed between September 2000 and March 2001 at Innovacom
Inc.2 works in that direction, showing an actual implementation of new applications of
MPEG technology, in particular MPEG archival and transmission, as well as RTP/UDP
MPEG streaming.
1
MPEG, referring to ISO/IEC 11172 and ISO/IEC 13818 standards, also known as MPEG1 and MPEG2.
2
Innovacom Inc. 3400 Garrett Drive 95054 Santa Clara CA USA www.transpeg.com
4
An MPEG archival and live streaming system over RTP/UDP networks
Startup point
The work carried out in this project was an addition to the MediaWEB ™ series of
products from Innovacom Inc. The system in place is completely based on MPEG
technology, either in its software components or its hardware parts.
Some of the already implemented features of the system were:
Live MPEG video encoding. Using state of the art, MPEG hardware encoders,
the system converts analog video and audio to a compressed MPEG stream,
being able to tweak the most common parameters of it like bitrate, GOP
structure, MPEG format (MPEG-1 or MPEG-2)…
Live MPEG video decoding. Relying on high quality hardware MPEG
decoders the system can output uncompressed video through analog and
digital interface.
Network transmitting & receiving. The system had support for two different
types of networking: ATM and TCP. The basic functionality was the ability to
send and receive MPEG data over ATM and TCP networks.
Some important pieces were missing in the system in place: the need of some kind
of archival feature. That translates into the need to be able to store MPEG streams into
files, and at the same time, being able to stream those stored files either to the network or
to an MPEG decoder (either hardware of software)
There was also the need to incorporate a more adequate system to transmit MPEG
data over TCP/IP networks. It would be interesting to have some kind of protocol to be
used over the network so it was possible to track problems in the transmission, (packet
loss, changes in the packet order…) and react in front of them.
5
An MPEG archival and live streaming system over RTP/UDP networks
After analyzing the existing features new specific requirements for the system
were made and are outlined next:
MPEG Archival
The system should be able to store live incoming streams from a non specified
source
The saved MPEG streams should be able to be used in most MPEG editing
applications, and should be MPEG compliant streams
A high compatibility ratio should be obtained with files stored with the
system. Any MPEG compliant device should be able to play them flawlessly
No assumption should be done from the way the structure of the MPEG
stream is setup
The system should be able to detect interruptions in the MPEG stream and
react accordingly to that.
The system should support System Streams, Program Streams, Transport
Streams and Elementary Streams 3.
Network streaming
The new network streaming protocol should support packet loss detection
It should support multiplexing and checksum services, in some layer of the
network structure.
The system should support System Streams, Program Streams, Transport
Streams and Elementary Streams
3
Elementary streams, includes MPEG-1 and MPEG-2 audio and video
6
An MPEG archival and live streaming system over RTP/UDP networks
2. Technologies
This project is based on two main technologies. The MPEG standard 4 and RTP,
the real time protocol. It’s desirable that the reader is familiar with them in order to
understand the concepts and features deployed in the work of this paper.
MPEG
Although the MPEG standard was developed at the beginning of the 90s it wasn’t
until recently when the flourishing and maturation of the MPEG technology took place.
All current digital video standards for broadcasting applications are based on MPEG, like
DVD (Digital Versatile Disk), DSS, DVB, DAB and an astounding number of other
applications.
However, it was interesting to find how a feature as basic as MPEG archival is not
widespread, at least in the sense in which is treated here. MPEG archival from existing
MPEG content is a rare feature that we’ve tried to bring into life from scratch given the
non-existing references we have found. MPEG file streaming is also a not very prominent
feature, so a non biased approach has been taken.
4
ISO/IEC 11172 and ISO/IEC 13818
7
An MPEG archival and live streaming system over RTP/UDP networks
We overview here some basic facts MPEG systems. Although most of the
explanations are MPEG-2 specific, they can very easily applied to the case of MPEG-1 if
one thinks of MPEG-2 program streams.
MPEG-2 Systems is an ISO/IEC standard (13818-1) that defines the syntax and
semantics of bitstreams in which digital audio and visual data are multiplexed. Such
bitstreams are said to be MPEG-2 Systems compliant. The MPEG specification does not
mandate, however, how equipment that produces, transmits, or decodes such bitstreams
should be designed. As a result, the specification can be used in a diverse array of
environments, including local storage, broadcast (terrestrial and satellite), as well as
interactive environments.
MPEG-2 Systems provides a two layer multiplexing approach. The first layer is
dedicated to ensure tight synchronization between video and audio. It is a common way
for presenting all the different materials which require synchronization (video, audio, and
private data). This layer is called Packetized Elementary Stream (PES). The second
layer is dependent on the intended communication medium. The specification for error
free environments such as local storage is called MPEG-2 Program Stream, while the
specification addressing error prone environments is called MPEG-2 Transport Stream.
The differences between MPEG-2 program stream and MPEG-1 Systems are
mild. MPEG-2 Systems mandated compatibility with MPEG-1 Systems. The MPEG-2
Program Stream is designed for that purpose. MPEG-2 Systems also addresses error
prone environments, and provides all the hooks for Conditional Access systems. The
major difference lies on the signalling which is present in MPEG-2 Program Streams and
was absent in MPEG-1 Systems. A minor difference also exists in the PES format.
MPEG-2 Transport Streams carry transport packets. These packets carry two
types of information: the compressed material and the associated signalling tables. A
8
An MPEG archival and live streaming system over RTP/UDP networks
transport packet is identified by its PID (Packet Identifier). Each PID is assigned to carry
data belonging either to one particular compressed data source (and only this data source)
or one particular signaling table. The ordered sequence of packets with a given PID may
be considered as one data stream. The compressed data source may be derived from either
video, audio or data elementary streams. These elementary streams may be tightly
synchronized (as it is usually necessary for Digital TV programs, or for Digital Radio
programs), or not synchronized (in the case of programs offering downloading of
software or games, as an example).
Synchronization
Synchronization inside MPEG systems is achieved through timestamps.
They are two types of time stamps:
The first type is usually called a reference time stamp. This time stamp is a sample
of the clock of the encoder that was used to generate this mux stream. Reference
time stamps are to be found in the PES syntax (ESCR), in the program syntax
(SCR), and in the transport syntax (PCR).
The second type of time stamp is called DTS (Decoding Time Stamp) or PTS
(Presentation Time Stamp). They indicate the exact moment where a video frame
or an audio frame has to be decoded or presented to the user respectively.
Although timestamps are not mandatory some applications like Digital TV
broadcast, where tight synchronization is required, will make an extensive use of them. In
that case both reference time stamp and DTS/PTS are used. In other cases (game or
software downloading for example) neither reference nor DTS/PTS time stamps are
necessary. DTS and PTS time stamps are not relevant if reference time stamps are not
present.
PTS and DTS are inserted as close as possible to the portions of compressed
video, audio, or data to which they apply. Precisely, this means that they are inserted in
the PES packet headers, a syntax which is common to all data sources.
STD Model
9
An MPEG archival and live streaming system over RTP/UDP networks
A system target decoder (STD) model is a virtual decoder. There are two models,
one within the MPEG-2 program syntax (the P-STD), the other within the MPEG-2
transport syntax (The T-STD). A model defines buffer sizes, their input and output rates,
and timing constraints related to time stamps values.
The STD model was invented for not being implementation dependent. The first
model comes from MPEG-1 Systems. Some of the assumptions in the T-STD are even
not realistic at all: buffers, for instance, when decoding occurs are supposed to be emptied
instantaneously.
Next we depict what is the syntax of MPEG-2. We highlight those elements that
are used later in this project, and are key to the way our systems work.
PES packet
packet
PES optional
start stream
packet PES PES packet data bytes
code id
length HEADER
prefix
24 8 16
10
An MPEG archival and live streaming system over RTP/UDP networks
188 bytes
transport
packet header payload header
header payload header
header payload
stream
1 15 2 22 4 33
11
An MPEG archival and live streaming system over RTP/UDP networks
System
Clock pack program pack pack system
pack '0
Referencelayer start SCR mux stuffing stuffing header PES packet 1 ...
1'
code rate length byte
32 2 42 22 5 3
system header rate audio fixed CSPS audio video video N loop
header length bound bound flag flag lock lock bound
start
flag flag
code
32 16 22 6 1 1 1 1 5
P-STD P-STD
stream '1 buffer buffer
id 1' bound size ... ...
scale bound
8 2 1 13
12
An MPEG archival and live streaming system over RTP/UDP networks
For simplicity, an explanation of the MPEG-1 syntax has been chosen. MPEG-2
inherits MPEG-1’s syntax and adds some extensions, but as far as we are concerned it
will be enough to be familiar with MPEG-1’s one. A brief explanation of the bitstream
structure is presented, and some of the relevant fields for our study are explained.
Overview
A sequence is the top video level of coding. It begins with a sequence header
which defines important parameters needed by the decoder. The sequence header is
followed by one or more groups of pictures. Groups of pictures, as the name suggests,
consist of one or more individual pictures. The sequence may contain additional sequence
headers. A sequence is terminated by a sequence_end_code. The MPEG standard allows
considerable flexibility in specifying application parameters such as bit rate, picture rate,
picture resolution, and picture aspect ratio. All these parameters are specified in the
sequence header.
Sequence
The encoder may set such parameters as the picture size and aspect ratio in the
sequence header, to define the resources that a decoder requires. In addition, user data
may be included.
A coded sequence begins with a sequence header and the header starts with the
sequence start code. Its value is:
hex: 00 00 01 B3
13
An MPEG archival and live streaming system over RTP/UDP networks
This is a unique string of 32 bits that cannot be emulated anywhere else in the
bitstream, and is byte-aligned, as are all start codes. To achieve byte alignment the
encoder may precede the sequence start code with any number of zero bits. These can
have a secondary function of preventing decoder input buffer underflow. This procedure
is called bit stuffing, and may be done before any start code. The stuffing bits must all be
zero. The decoder discards all such stuffing bits.
The sequence start code, like all video start codes, begins with a string of 23 zeros.
The coding scheme ensures that such a string of consecutive zeros cannot be produced by
any other combination of codes, i.e. it cannot be emulated by other codes in the bitstream.
This string of zeros can only be produced by a start code, or by stuffing bits preceding a
start code.
Vertical Size
This is a 12-bit number representing the height of the picture in pels, i.e. the
vertical resolution. It is an unsigned integer with the most significant bit first. A value of
zero is not allowed (to avoid start code emulation) so the legal range is from 1 to 4095. In
practice values are usually a multiple of 16. At 1.5 Mbps, a popular vertical resolution is
240 to 288 pels. Values of 240 pels are convenient for interfacing to 525-line NTSC
systems, and values of 288 pels are more appropriate for 625-line PAL and SECAM
systems.
If the vertical resolution is not a multiple of 16 lines, the encoder must fill out the
picture at the bottom to the next higher multiple of 16 so that the last few lines can be
coded in a macroblock. The decoder will discard these extra lines before display.
14
An MPEG archival and live streaming system over RTP/UDP networks
Horizontal Size
This is a 12-bit number representing the width of the picture in pels, i.e. the
horizontal resolution. It is an unsigned integer with the most significant bit first. A value
of zero is not allowed (to avoid start code emulation) so the legal range is from 1 to 4095.
In practice values are usually a multiple of 16. At 1.5 Mbps, a popular horizontal
resolution is 352 pels. The value 352 is derived from half the CCIR 601 horizontal
resolution of 720, rounded down to the nearest multiple of 16 pels. Otherwise the encoder
must fill out the picture on the right to the next higher multiple of 16 so that the last few
pels can be coded in a macroblock. The decoder will discard these extra pels before
display.
Picture Rate
The allowed picture rates are commonly available sources of analog or digital
sequences. One advantage in not allowing greater flexibility in picture rates is that
standard techniques may be used to convert to the display rate of the decoder if it does not
match the coded rate.
Bit Rate
The bit rate is an 18-bit integer giving the bit rate of the data channel in units of
400 bps. The bit rate is assumed to be constant for the entire sequence. The actual bit
rate is rounded up to the nearest multiple of 400 bps. For example, a bit rate of 830100
bps would be rounded up to 830400 bps giving a coded bit rate of 2076 units.
15
An MPEG archival and live streaming system over RTP/UDP networks
If all 18 bits are 1 then the bitstream is intended for variable bit rate operation.
The value zero is forbidden.
For constant bit rate operation, the bit rate is used by the decoder in conjunction
with the vbv_delay parameter in the picture header to maintain synchronization of the
decoder with a constant rate data channel. If the stream is multiplexed using Part 1 of this
standard, the time-stamps and system clock reference information defined in Part 1
provide a more appropriate tool for performing this function.
The buffer size is a 10-bit integer giving the minimum required size of the input
buffer in the model decoder in units of 16384 bits (2048 bytes). For example, a buffer
size of 20 would require an input buffer of 20 x 16384 = 327680 bits (= 40960 bytes).
Decoders may provide more memory than this, but if they provide less they will probably
run into buffer overflow problems while the sequence is being decoded.
Group of Pictures
Two distinct picture orderings exist, the display order and the bitstream order (as
they appear in the video bitstream). A group of pictures (gop) is a set of pictures which
are contiguous in display order. A group of pictures must contain at least one I picture.
This required picture may be followed by any number of I and P pictures. Any number of
B pictures may be interspersed between each pair of I or P pictures, and may also precede
the first I picture.
A group of pictures, in bitstream order, must start with an I picture and may be
followed by any number of I, P or B pictures in any order.
Another property of a group of pictures is that it must begin, in display order, with
an I or a B picture, and must end with an I or a P picture. The smallest group of pictures
consists of a single I picture, whereas the largest size is unlimited.
The original concept of a group of pictures was a set of pictures that could be
coded and displayed independently of any other group. In the final version of the MPEG
standard this is not always true, and any B pictures preceding (in display order) the first I
picture in a group may require the last picture in the previous group in order to be
decoded. Nevertheless encoders can still construct groups of pictures which are
independent of one another. One way to do this is to omit any B pictures preceding the
first I picture. Another way is to allow such B pictures, but to code them using only
backward motion compensation.
16
An MPEG archival and live streaming system over RTP/UDP networks
I
I P P
I B P B P
B B I B P B P
B B I B B P B B P B B P
B I B B B B P B I B B I I
The group of pictures header starts with the Group of Pictures start code. This
code is byte-aligned and is 32 bits long. Its value is:
hex: 00 00 01 B8
It may be preceded by any number of zeros. The encoder may have inserted some
zeros to get byte alignment, and may have inserted additional zeros to prevent buffer
underflow. An editor may have inserted zeros in order to match the vbv_delay parameter
of the first picture in the group.
Time Code
A time code of 25 bits immediately follows the group of pictures start code. This
time code conforms to the SMPTE time code [6].
The time code can be broken down into six fields as shown in the following table:
17
An MPEG archival and live streaming system over RTP/UDP networks
The time code refers to the first picture in the group in display order, i.e. the first
picture with a temporal reference of zero.
Closed GOP
A one bit flag follows the time code. It denotes whether the group of pictures is
open or closed. Closed groups can be decoded without using decoded pictures of the
previous group for motion compensation, whereas open groups require such pictures to be
available.
I B B P B B P B B P B B P
0 1 2 3 4 5 6 7 8 9 10 11 12
closed group
B B I B B P B B P B B P B B P
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
A less typical example of a closed group is shown in this last example. In it, the B
pictures which precede the first I picture must use backward motion compensation only,
i.e. any motion compensation must be based only on picture number 2 in the group.
If the closed_gop flag is set to 0 then the group is open. The first B pictures in the
group may have been encoded using the last picture in the previous group for motion
compensation.
Broken Link
A one bit flag follows the closed gop flag. It denotes whether the previous group
of pictures can be used to decode the current group. Encoders normally set this flag to 0
indicating that the previous group can be used for decoding. If the sequence has been
edited so that the original group of pictures no longer precedes the current group, then this
flag must be set to 1 by the editor.
If the group is closed, then the flag is less useful. It is suggested that encoders still
set it to zero, and editors set it to 1 so that decoders can detect if the bitstream has been
edited at that point.
18
An MPEG archival and live streaming system over RTP/UDP networks
Picture
The picture layer contains all the coded information for one picture. The header
identifies the temporal reference of the picture, the picture coding type, the delay in the
video buffer verifier (VBV) and, if appropriate, the range of the vectors used.
The syntax
A picture begins with a picture header. The header starts with a picture start code.
This code is byte-aligned and is 32 bits long. Its value is:
hex: 00 00 01 00
Temporal Reference
The Temporal Reference is a ten-bit number which can be used to define the order
in which the pictures must be displayed. It is useful since pictures are not transmitted in
display order, but in the order which the decoder needs to decode them. The first picture,
in display order, in each group must have Temporal Reference equal to zero. This is
incremented by one for each picture in the group.
19
An MPEG archival and live streaming system over RTP/UDP networks
Some example groups of pictures with their Temporal Reference numbers are
given below:
Example (a) in I B P B P
display order 0 1 2 3 4
Example (a) in I P B P B
decoding order 0 2 1 4 3
Example (b) in B B I B B P B B P B B P
display order 0 1 2 3 4 5 6 7 8 9 10 11
Example (b) in I B B P B B P B B P B B
decoding order 2 0 1 5 3 4 8 6 7 11 9 10
Example (c) in B I B B B B P B I B B I I
display order 0 1 2 3 4 5 6 7 8 9 10 11 12
Example (c) in I B P B B B B I B I B B I
decoding order 1 0 6 2 3 4 5 8 7 11 9 10 12
If there are more then 1023 pictures in a group, then the Temporal Reference is
reset to zero and then increments anew. This is illustrated below:
A three bit number follows the temporal reference. This is an index into the
following table defining the type of picture.
VBV Delay
For constant bit rate operation, vbv_delay defines the current state of the VBV
buffer (VBV is an acronym for Video Buffer Verifier - the model decoder). It specifies
how many bits it should contain when bits for all previous pictures have been removed,
and the model decoder is about to start decoding the current picture.
20
An MPEG archival and live streaming system over RTP/UDP networks
Its purpose is to allow the decoder to synchronize its clock with the encoding
process, and to allow the decoder to determine when to start decoding a picture after
random access in order not to run into future problems of buffer overflow or underflow.
The buffer fullness is not specified in bits but rather in units of time. The
vbv_delay is a 16-bit number defining the time needed in units of 1/90000 second to fill
the input buffer of the model decoder from an empty state to the current state at the bit
rate specified in the sequence header.
For example, suppose the vbv_delay had a decimal value of 30000, then the time
delay would be:
If the channel bit rate were 1.2 Mbps then the contents of the buffer before the
picture is decoded would be:
If the decoder determined that its actual buffer fullness differed significantly from
this value, then it would have to adopt some strategy for regaining synchronization.
21
An MPEG archival and live streaming system over RTP/UDP networks
RTP
Multimedia networking 5 faces many technical challenges like real-time data over
non-realtime networks, high data rate over limited network bandwidth, unpredictable
availability of network bandwidth.
Second, almost all multimedia applications require the real-time traffic which is
very different from non-real-time data traffic. If the network is congested, the only effect
on non-real-time traffic is that the transfer takes longer to complete. In contrast, real-time
data becomes obsolete if it doesn't arrive in time. As a consequence, real-time
applications deliver poor quality during periods of congestion.
On the other hand, bandwidth is not the only problem. For most multimedia
applications, the receiver has a limited buffer, if the data arrives too fast, the buffer can be
overflowed and some data will be lost, also resulting in poor quality.
Therefore, protocols for realtime applications must be worked out to get real
multimedia networking.
Because of its shared nature, at first glance, datagram networks do not seem
suitable for real-time traffic. Packets are routed independently across shared networks, so
transit times vary significantly (jitter 6). A class of real-time applications called playback
applications aims to solve the jitter problem. Playback applications aim to solve the jitter
problem by buffering at the receiver. Adaptive applications adapt to changing delays and
5
Multimedia networking is understood as building the multimedia on network and distributed systems, so
different users on different machines can share image, sound, video, voice and many other features and to
communicate with each under these tools.
6
Jitter - Variations in transit delays
22
An MPEG archival and live streaming system over RTP/UDP networks
work well on moderately loaded datagram networks. They can deal with jitter caused by
short-lived bursts, and they can tolerate occasional lost packets during brief periods of
congestion.
However, parts of the Internet are often heavily loaded. The price tag attached to
shared bandwidth is congestion, leading to jitter and packet loss. At certain times of the
day, some MBone audio multicasts are unintelligible because of more than 30% packet
loss. While real-time traffic contributes heavily to congestion because of large bandwidth
requirements, it also suffers more from congestion than non-real-time traffic. The only
effect of congestion on non-real-time traffic is that a transfer takes longer to complete. In
contrast, real-time data becomes obsolete if it doesn't arrive in time. As a consequence,
real-time applications deliver poor quality during periods of congestion.
To cope with congestion, several approaches have been proposed in which the
application adapts to the available bandwidth by switching to a different encoding.
Adaptive encoding mechanisms help to keep up useful service during congestion, but
they are not a general solution. Real-time applications are useless when the available
bandwidth drops below a certain minimum bandwidth or when transit times vary so much
that interactivity is impossible.
Some solutions
The Integrated Services working group in the IETF (Internet Engineering Task
Force) developed an enhanced Internet service model that includes best-effort service and
real-time service. The Resource Reservation protocol(RSVP), together with Realtime
Transport Protocol (RTP), Real-Time Control Protocol(RTCP), Real-Time Streaming
Protocol (RTSP), provides a working foundation for this architecture that is a
comprehensive approach to provide applications with the type of service they need in the
quality they choose.
In the framework of our project, where MPEG transmission using some kind of
streaming transport protocol was required, RTP stood out as the clear option for
streaming MPEG over networks TCP/IP. RTP provides a thin layer between the actual
MPEG stream data and the TCP/IP framing, and at the same time provides basic features
as sequencing for detecting data loss, time stamping and payload type detection. Because
of the importance of RTP for this project, let’s review the protocol in detail.
23
An MPEG archival and live streaming system over RTP/UDP networks
Because of their unpredictable delay and availability, TCP/UDP are not suitable
for applications with realtime character. The realtime transport protocol (RTP) is a thin
protocol providing support for applications with real-time properties, including timing
reconstruction, loss detection, security and content identification. RTP can be used
without RTCP if desired. RTP can transport independently so that it could be used over
CLNP (Connectionless Network Protocol), IPX (Internetwork Packet Exchange) or other
protocols. RTP is currently also in experimental use directly over AAL5/ATM.
Development
After some initial experiments, which go back to the early 70's, research in the
field of audio transmission over the Internet increased enormously. Voice experiments
within the DARTnet (ARPA network) in 1991 formed the groundwork for RTP based
MBone transmissions.
It finally got approved on Nov 22, 1995 by the IESG as an proposed standard. At
this time several not backward-compatible changes had been made resulting in RTP
version 2. It has been published as
The latest extensions have been made by an industry alliance around Netscape
Inc., who uses RTP as the basis of their Real Time Streaming Protocol RTSP.
RTP operation
There are two transport layer protocols in the Internet protocol suite, TCP and
UDP. TCP provides a reliable flow between two hosts. It is connection-oriented and thus
7
http://www-nrg.ee.lbl.gov/vat/
24
An MPEG archival and live streaming system over RTP/UDP networks
can't be used for multicast. UDP provides a connectionless unreliable datagram service.
To use UDP as a transport protocol for real-time traffic, some functionality has to be
added. Functionality that is needed for many real-time applications is combined into RTP,
the real-time transport protocol. RTP is standardized in RFC 1889. Applications typically
run RTP on top of UDP as part of the transport layer protocol, as shown in this figure.
25
An MPEG archival and live streaming system over RTP/UDP networks
The first twelve octets are present in every RTP packet, while the list of CSRC
(contributing source) identifiers is present only when inserted by a mixer.The fields have
the following meaning:
RTP features
RTP provides end-to-end delivery services for data with real-time
characteristics, such as interactive audio and video.
Applications typically run RTP on top of UDP to make use of its
multiplexing and checksum services. But efforts have been made to make
RTP transport-independent so that it could be used on other protocols.
RTP itself does not provide any mechanism to ensure timely delivery or
provide other quality of service guarantees, but relies on lower-layer
services to do so. RTP assumes that the underlying network is reliable and
delivers packets in sequence.
RTP is a protocol framework that is deliberately not complete. A complete
specification of RTP for a particular application requires a profile
specification or/and a payload format specification.
RTP doesn't assume anything about the underlying network, except that it
provides framing. Its original design target was the Internet, but it is
intended to be protocol-independent. For example, test runs of RTP
transmissions over ATM AAL5 and IPv6 are in progress.
Field PT (payload type) of the RTP header identifies within seven bits the
media type and encoding/compression format of the payload. At any given
time an RTP sender is supposed to send only a single type of payload,
although during transmission change of payload types may occur (e.g. in
26
An MPEG archival and live streaming system over RTP/UDP networks
reaction to bad receiving rate feedback from the receiver via RTCP
packets).
RTP provides functionality suited for carrying real-time content, e.g. a
timestamp and control mechanisms for synchronizing different streams
with timing properties. Because RTP/RTPC is responsible for controlling
the flow of one media stream it will not automatically synchronize various
streams. This has to happen at application level.
The basis for flow and congestion control is provided by RTCP sender and
receiver reports. We distinguish transient congestion and persistent congestion. By
analyzing the interarrival jitter field of the sender report (below), we can measure the
jitter over a certain interval and indicate congestion before it becomes persistent, hence
resulting in packet loss.
27
An MPEG archival and live streaming system over RTP/UDP networks
3. Mpeg Archival
Before proceeding to read this section a reading of the ‘MPEG Systems, an
overview’ is highly recommended.
Introduction
We understand for MPEG Archival to the act of storing an MPEG stream into a
file. Although it seems a trivial problem when we first approach it, some questions and
problems arise when we think of the way to bring this feature into life.
The first thing to point out is what are our requirements for the saved file. Let’s
point them out:
Probably the most important point is that the saved stream should be compliant
with the MPEG standard. The meaning of compliance can also be described as generating
a stream that follows the syntax defined in the international standards ISO/IEC 11172 and
ISO/IEC 13818 also known as MPEG1 and MPEG2.
However, in order to meet our requirements not only MPEG compliance has to be
assured. We have to make sure that some constrains are meet in the stream so all the
requirements are met. For example, we could have an MPEG stream were the embedded
clock in it resets to zero from to time to time. Because the embedded clock is usually used
by the player to be able to give a global position in the stream, such behavior would fool
the player and the results we would get are totally unpredicted. In video streams for
example, it would be interesting to start with a closed GOP if possible so references to
non available pictures would be avoided.
Accordingly because the system has to be able to work a the different levels of the
MPEG standard, either with multiplexed streams or elementary streams we have to
review what we steps should be taken to be compliant in all the cases.
In order for the following actions to be true we assume that the original stream is
already MPEG compliant.
28
An MPEG archival and live streaming system over RTP/UDP networks
MPEG audio elementary streams have a simple structure formed by audio frames.
An audio frame is the smallest portion of MPEG audio data that one can decode.
Trying to save this kind of stream one has to make sure that stream starts and
finishes with an MPEG audio frame.
The audio elementary stream must start with a valid mpeg audio frame
The audio elementary stream must end with a valid mpeg audio frame
In order for a MPEG decoder to be able to learn what kind of stream we’re going
to decode a sequence header has to appear at the beginning of the stream. That is what all
existing decoders look for to be able to start the decoding process. The sequence header
carries information as important as video resolution, frame rate, VBV Buffer size …
However the MPEG standard doesn’t require that the sequence header is
transmitted repeatedly. Only once at the beginning of the stream is enough. Many MPEG-
1 streams are like this case, an only present one sequence header when the stream starts.
The video elementary stream must start with a valid sequence header
It would also be interesting that no video glitches appear on the video when the
decoder starts its work. To achieve that, we could make sure that our stream starts with a
closed GOP, that is, no references from previous GOPs are needed to decode the pictures
in the current GOP. However that kind of setting is behavior is optional so we can’t rely
on the appearance of a closed GOP as a good start point.
In the worst case, we should indicate by setting the broken_link flag in the GOP
header that in fact, that GOP has been extracted from an existing stream and some
references are missing. The decoder could use that information to not to present the first
bad frames.
Last, it is necessary also to end the video stream with the last picture of the last
GOP that we have received. Doing that we can assure that we have all the information to
decode all the frames. Also we should add and sequence_end_code to the end of the
stream.
The video elementary stream must end with the last frame of the last
complete GOP received.
A sequence_end_code must be added to the end of the stream
Due to the similarity between the syntax of MPEG-2 and MPEG-1 we can
consider this case as just one.
29
An MPEG archival and live streaming system over RTP/UDP networks
A program/system stream is the multiplex system that MPEG streams usually use.
That implies that audio and video elementary streams are carried inside these kind of mux
streams, and as a consequence, all the requirements we have seen for video an audio
elementary streams are going to apply in this case.
Additionally, we have to meet more restrictions in this case that are exclusive to
program and system streams.
The stream must start with a pack header with the correct value in the
program_mux_rate field.
The stream must include a valid system header for the following MPEG
data or valid until a new system header is found. A valid system header,
has to include all the correct fields for the stream that we’re handling
The stream must end with a program_stream_end_code
Due to the similarity between Program Streams (MPEG-2) and System Streams
(MPEG-1) we can take this case as just one.
Transport Streams
A transport stream is the multiplex system that MPEG streams usually use. That
implies that audio and video elementary streams are carried inside these kind of mux
streams, and as a consequence, all the requirements we have seen for video an audio
elementary streams are going to apply in this case.
Additionally, we have to meet more restrictions in this case that are exclusive to
program and system streams.
All transport streams are formed by transport packets of 188 bytes each. In terms
of MPEG compliance it will be enough to ensure that stream starts with a transport packet
and ends with it.
The stream must start with a transport packet.
The stream must end with a transport packet.
Special care has to be taken in the detection of the transport packet, due to the fact
that the sync byte of the transport packet, 0x47 is very easy to emulate inside any part of
the stream.
30
An MPEG archival and live streaming system over RTP/UDP networks
The Approach
Once we have seen the kind of problems that need to be resolved we are in the
position to provide different approaches to build the system.
The first approach one could think of is, of course, saving the incoming raw
bitstream directly to a file. However, this poses serious problems to meet the
requirements.
Let’s say we’re set for example to save a video elementary stream. By starting to
save the stream at a random point we’re are very likely to miss the startcode that indicates
the sequence_start_code that is usually what any player will be looking for to start
operations. Also, without further analysis, it’s very unlikely that the video will start with
a closed gop, and even less likely that it will start with a picture header. Most decoders
will ignore such a stream as invalid. If the decoder knew beforehand that the stream that
was going to be sent was a video elementary stream, there is a chance that the decoder
operation would start. However, probably some artifacting would occur at the beginning
of the decoding process due to the fact that no one took care of starting the stream with a
closed gop if available, and the references are lost.
A different approach must be taken. Although there are possible many solutions to
this problem, we’re presenting here the one that we opted to. The problem was attacked
by having two different processes.
In the first process, or analysis process the incoming stream is analyzed and all
the important features of it are labeled and indexed. Also, important information from the
stream is retrieved and it will be used afterwards.
The second phase, or decision process we take all the information from the
analysis phase some decisions are made and some actions are performed.
It’s important to note that both processes are concurrent, due to the realtime nature
of the project. The important point to be seen is that the data extracted from the analysis
process is continuously sent to the decision process.
MPEG STREAM
31
An MPEG archival and live streaming system over RTP/UDP networks
Analysis process
During the analysis process, key data from the incoming stream is retrieved. The
stream is fully indexed and labeled so that appropriate decisions can be taken from that
information.
For each different type of stream we can have, audio and video elementary streams,
program and transport streams we have to define what the analysis consists of. In the next
pages we discuss, what are the options taken in this respect, and how some of the
problems where solved.
32
An MPEG archival and live streaming system over RTP/UDP networks
No
Label position
as a valid sync
word
The first thing we do is waiting for the arrival of the MPEG audio frame sync
word. Because the syncword is composed by only 12 bits, 0xFFF we have no guarantees
that the syncword that we just detected was the beginning of a real audio frame.
To figure out if we have a valid start of an audio frame we must analyze further
the data that follows the detected sync word.
Let’s take a look at the MPEG audio frame header:
Mpeg_audi o_header()
{
syncwor d 12 bi t s
ID 1 bi t
l ayer 2 bi t s
pr ot ecti on_bi t 1 bi t
bi trat e_i ndex 4 bi t s
sa mpli ng_frequency 2 bi t s
paddi ng_bi t 1 bi t
pri vat e_bit 1 bi t
mode 2 bi t s
mode_ext ensi on 2 bi t s
copyri ght 1 bi t
33
An MPEG archival and live streaming system over RTP/UDP networks
We can quickly see that if we analyze the data after the start code we found, it
should follow the pattern above. Not all the fields can have all the possible values so that
adds even more reliability to our method of detecting emulated startcodes.
Besides, the position of consecutive sync words, that is the audio frame size, can
be calculated from the information provided by the seven bits just after the sync word :
the bitstream is subdivided in slots. The distance between the start of two consecutive
sync words is constant and equals "N" slots. The value of "N" depends on the Layer.
If this calculation does not give an integer number the result is truncated and
'padding' is required. In this case the number of slots in a frame will vary between N and
N+1. The padding bit is set to '0' if the number of slots equals N, and to '1' otherwise. This
knowledge of the position of consecutive sync words greatly facilitates our task to found
valid audio frame beginnings.
Once we’ve finally determined a valid sync word, we label its position so we can
go to it later if we decide to start saving from that point.
34
An MPEG archival and live streaming system over RTP/UDP networks
Wait for
sequence_start_
code
sequence_header_startcode
Wait for next
startcode
Gop_header_startcode picture_header_startcode
The first thing we have to do in the case of an elementary video stream is wait for
the a sequence header. Because of the way MPEG video elementary streams are
constructed, if the stream was originally MPEG compliant no startcode emulation can
appear. For this reason, we don’t have to double check that the sequence header we found
is actually a sequence start code.
The process from that point involves labeling all the sequence header, gop and
picture header appearances so we can decide later what actions to take.
We also retrieve and store specific information from the sequence header, like
quantization matrixes, resolutions or the broken_link and closed_gop flags from the gop
header.
35
An MPEG archival and live streaming system over RTP/UDP networks
Program/System Streams
Label pack
header and store
info
No
Is next startcode a
system header?
Yes
Label system
header and get
info
The first code that we wait for when dealing with program streams is the
pack_start_code. We also want that following the pack header we have a system header.
The system header carries vital information of the stream, like number of streams, size of
the buffers for the demultiplex operation, so it is very important to retrieve it before being
able to continue.
36
An MPEG archival and live streaming system over RTP/UDP networks
From that point we label and retrieve all the information from all the pack headers
that pass by. Especially important is the SCR (System Clock Reference) that we can find
in every pack header. That info will be helpul lately to identify disruptions in the stream,
and to be able to modify the timestamps inside the PES headers.
When we come across with a PES we also label its position, retrieve the
timestamps included in it if any.
Also, when we have received a PES header, we must analyze the payload and deal
with that stream. For example, if then we have a video PES header, we must take the
payload of the PES packet and analyze it as stated in the video elementary stream part.
This process is equivalent to demultiplex the program stream and deal with the
individual streams separately. This fact indicates that we can build the actual system, in
a modular fashion where we have different parsers for the different elementary streams,
and a system that controls them.
37
An MPEG archival and live streaming system over RTP/UDP networks
Transport Streams
No
Label position
as a valid sync
byte
Analyze packet
data
In the case of transport stream, it’s very important to identify the correct
beginning of a transport packet.
A transport packet begins with the sync byte 0x47 and it’s very likely that it will
be emulated somewhere in the stream.
The algorithm chosen to synchronize with a transport packet is the following:
because all transport packets have a fixed length of 188 bytes, a new sync byte should
appear 188 bytes after the sync byte detected. To make the detection more robust we
repeat this process several times imposing that all times we have find the sync byte in the
appropriate position. If any of the tries fails, we must start from scratch. This algorithm
proved to be really robust for synchronizing with transport stream packets.
On the other hand, a transport stream case can be seen as the program stream case.
It is a multiplex, that basically wraps a PES stream. So, once we have been synchronized
with the stream, we must analyze the data inside the PES as if they were elementary
streams. Again, This process is equivalent to demultiplex the transport stream and
deal with the individual streams separately. This fact further indicates that we can
38
An MPEG archival and live streaming system over RTP/UDP networks
build the actual system, in a modular fashion where we have different parsers for the
different elementary streams.
39
An MPEG archival and live streaming system over RTP/UDP networks
Decision process
The decision process is carried out concurrently with the analysis process. At this
stage, and taking the information from the analysis, the algorithm determines the last
course of action for the given scenario.
Let’s recall what kind of information we retrieve from the analysis process
Labels. They specify the position of all the important features in the
stream. With them we can know where sequence header , gop headers,
picture headers begin, and so we have a quick access to that information if
we need to.
Features info. We retrieved the content of the sequence headers, gop
headers and picture headers and we can use this information now. For
example, if we want to terminate the stream we can use the
temporal_reference flag to determine which of the pictures is last in the
stream, and can be used correctly.
All the decisions are dependent of the type of stream we’re dealing with. Let’s
review every case in particular to see what are decision flow that is taken.
40
An MPEG archival and live streaming system over RTP/UDP networks
Enough No
audio frames
to save?
Yes
Save audio
frame
For this case we wait until we have an audio frame label. Finding an audio label is
equivalent to having found an mpeg audio frame.
Avoiding saving every frame individually, will help us to be more efficient in
terms of efficiency. For that we wait until we have a number of audio frames to write
them to disk.
41
An MPEG archival and live streaming system over RTP/UDP networks
Wait for
sequece header
label
Do we have No
a gop of
pictures?
Yes
Save gop of
pictures.
Modify gop
headers.
For the case of video elementary streams the following approach was taken.
At the beginning of the process we wait for a sequence header label to appear.
Let’s remember that the sequence header is a must at the beginning of any VES due to the
key information it carries.
Once we’ve got the sequence header what we do is tracking the gop of pictures
labels. The basic idea behind it is that we have a valid piece of the stream to be saved
between gop header marks. The pictures inside the gop start just after the first gop header,
and all of them are contained between the two gop header labels. If we’re in this case
we’re set to save that portion of the stream to disk.
An interesting problem arises trying to determine the cutoff point in the last
picture of the stream. Because the picture header information does not contain the size of
the picture we can’t know beforehand where the picture ends, and consequently we don’t
know where is the last byte that we will save.
The first option would be parsing the whole picture information data to get to the
end of it. This option doesn’t look attractive due to the complex structure of the MPEG
42
An MPEG archival and live streaming system over RTP/UDP networks
video frame data, that would lead to taking more computational resources than the ones
strictly needed.
On the other hand, we can make a good assumption about the MPEG video
elementary stream. That is, before a sequence header or gop header startcode we will
have the last byte of the previous picture coded in the stream. Taking a look to the
bitstream syntax:
video_sequence() {
next_start_code()
sequence_header()
do {
extension_and_user_data( 0 )
do {
if (nextbits() == group_start_code) {
group_of_pictures_header()
extension_and_user_data( 1 )
}
picture_header()
picture_coding_extension()
extensions_and_user_data( 2 )
picture_data()
} while ( (nextbits() == picture_start_code) ||
(nextbits() == group_start_code) )
if ( nextbits() != sequence_end_code ) {
sequence_header()
sequence_extension()
}
} while ( nextbits() != sequence_end_code )
sequence_end_code
}
If we analyze the syntax we can derive from it that after the picture data (
picture_data() ) we can only have sequence header start codes, picture start codes and
group start codes. Our assumption is then, right on track and we can use this fact to
determine the end of the last picture coded in the gop.
If this is the first gop we are saving and it’s an open gop, we’ve lost all the
references needed to decode the first pictures in the stream. In this case, we indicate it by
setting the broken_link flag in the first gop header that we´re going to save.
43
An MPEG archival and live streaming system over RTP/UDP networks
Can the No
payload be
saved?
Yes
Save mux
stream portion
Modify SCR
and
timestamps.
As we stated the program stream must start with a pack / system header pair. The
decision process waits for those two at the beginning. From that point the mux process
enters in a loop looking for PES headers, from the mux point of view all data inside a
PES can be saved.
We saw during the analysis process, how when we’re dealing with multiplex
streams, we have to merge the analysis of the individual elementary streams with the ones
from the mux stream.
For example, we are receiving a program stream with one video elementary
stream inside it. The analysis process marked all the important features at both levels: the
elementary stream and the mux stream. Now, we have to take into account whatever
decisions are necessary in the case of the video elementary stream. We can have a portion
of the program stream ready to be saved but we are waiting to have a group of pictures
inside the payload of the mux to do the actual saving.
In any case we have to take a look at the labels and information from the analysis
process of the video elementary stream to be able to make a decision. After joining the
two labels we can determine what parts of the stream are ready to be saved.
44
An MPEG archival and live streaming system over RTP/UDP networks
P P P P P
e e e e e
s s s s s
H H H H H
e Payload e Payload e Payload e Payload e
a a a a a
d d d d d
e e e e e
r r r r r
In this example, the grey parts are those that can be saved. In the last PES packet,
the payload has been validated for saving partially. Thus we can´t save that part of the
payload or even the PES header. Only those PES headers where the payload has been
fully validated for saving can be actually saved by now.
When the moment arrives and we have to finish the operation, we can be left with
PES packets without fully validated payloads. One possible course of action is padding
the non validated areas with zeroes so no more data is seen there, and then validating the
whole PES.
45
An MPEG archival and live streaming system over RTP/UDP networks
Implementation
The system was coded using the C++ language. It gives the user the ability to
write the code very modularly, a thing that was specially important in this project. C++ is
an object oriented language, that is, code can be written as objects that can act
independently and have a life of its own.
Parsers. The core of the system, and the ones who perform the analysis and
decision process for each type of stream. A common interface was created in the
parser objects, so a combination of them was very straightforward. The basic idea
behind it is that the stream is presented to the parser and then it´s the parser who
validates the areas of it to be saved. For example, the program stream parser does
its work on the PES headers, and then passes the payload to a video elementary
stream parser. The last, returns the validated payload to the program stream
parser, being that way the work done inside the elementary payload transparent to
the program stream parser.
The most convoluted case is when dealing with transport streams where the TS
parser does calls to the PS parser and at the same time this ones does the same for
the ES parser.
Flush File Map. The file map object is an actual copy of the incoming stream in
memory system. The parsers label the stream features in terms of offsets of the
flush file map, modify the stream in it and validate chunks of it.
It is also useful for buffering reasons. Because we don´t know when we will be
able to start storing the file, we need quite a bit piece of the stream stored before
we do that.
The flush file map is also an abstraction of the file being saved, as the times goes
by all the data in the file map, ends in the saved file. All the offsets are kept and
then all labels are valid.
Fully validated areas of the map are no longer represented in memory so they
can‘t be accessed.
46
An MPEG archival and live streaming system over RTP/UDP networks
47
An MPEG archival and live streaming system over RTP/UDP networks
Introduction
To understand the concept of MPEG file streaming let’s present the following
scenario.
We have an archived MPEG file stored inside a computer that has a network
connection to a TCP/IP network. We want to multicast or stream the file to multiple
recipients to emulate a broadcast station where only one emitter is present and multiple
receivers are watching the same content.
The first problem to be solved is how to deliver the MPEG file to the network.
Although a trivial problem at first sight some questions arise:
It’s very important to note here that we are talking of file streaming where no
feedback from the recipients is get. That means, that our system has to be able to work
independently from the number of receivers we have.
The system
The system has to emulate the behavior of an MPEG encoder, trying to avoid
overflows or underflows in the buffering system of the decoder. For achieving that, the
sending of the data has to be precisely timed or otherwise glitches will appear in the
video.
48
An MPEG archival and live streaming system over RTP/UDP networks
Syncronize with
audio frame
Retrieve
sampling rate
and layer
info
Send audio
frame
Wait until
frame duration
has elapsed
Compensate
timing
In order to stream MPEG audio data, the most reasonable approach, seems to be
sending audio frames one after the other trying to keep the pace which they have to be
played back.
Two options can be used here to time the outgoing pace of the frames. Either we
use the bitrate information in the frame or we can also take into account the fact that all
frames represent a fixed number of samples.
The option chosen was the second. The bitrate information is dependent on how
the encoder built the frame and can quickly introduce inaccuracies. On the other side, we
know that the number of samples per frame is fixed, so the second option seems a much
more robust one.
49
An MPEG archival and live streaming system over RTP/UDP networks
For example in the case of 44.1 kHz audio and layer III audio
If we note that the granularity (the precision in which the operating system can
time our operations) for the Windows platform is of 10 ms, we are going to need a
mechanism to compensate for the errors introduced by the operating system.
Timing Compensation
No
50
An MPEG archival and live streaming system over RTP/UDP networks
Syncronize with
video frame
Parse picture
data
Send picture
data
Wait until
frame duration
has elapsed
Compensate
timing
For video elementary streams the mechanism is no different than with audio. We
grab whole video frames and then send them.
Because we need to send an entire frame we jump until the beginning of the next
picture and consider the data between picture start codes as the picture data.
The duration of the video frame, or the frame rate can be known by looking at the
sequence header of the video sequence. Special care has to be given to MPEG-2 field
pictures where the wait time is half the frame period. Because an MPEG-2 sequence can
be formed with field and frame pictures interleaved we have to extract this information
from the picture header to know the exact time we will have to wait.
51
An MPEG archival and live streaming system over RTP/UDP networks
Again, the frame/field periods are of the order of 10 ms, that turns out to be the
granularity of our operating system. Again, a compensation mechanism is required, and
its exactly the same we used for audio elementary streams, so we are not going to repeat it
here.
52
An MPEG archival and live streaming system over RTP/UDP networks
Program/System Stream
Synchronize
with pack
header
Grab SCR
stamp
No
SCR < Wait
STC lapse
Yes
Send pack
An interesting problem arises in this case. It was seen inside many sample streams
how the SCR stamp inside the stream, was reset to zero from time to time. Because in our
system we’re comparing the SCR with a monolithically increasing STC (System Time
Clock) some kind of adaptation must be done before doing the comparison.
Then, just after grabbing the SCR stamp, if we detect it jumps for more than one
second from the previous value we assume that there has been a discontinuity in the
stream, and from that point and then we adjust the SCR so the comparison with the STC
is meaningful.
53
An MPEG archival and live streaming system over RTP/UDP networks
The algorithm is then very simple. If the SCR in the stream lags behind the STC
we just send the pack. If the SCR is still in the future, we go and wait a little bit to send it.
It was seen that time lapses of 10 ms in the wait where small enough to make the system
work.
Overall the system proved to really resemble the output of a real time encoder and
that was we were looking for. In tests with a decoder no flaws where observed a good
indication that neither overflows nor underflows were occurring in the stream.
54
An MPEG archival and live streaming system over RTP/UDP networks
Transport stream
Lock PCR
stream
Synchronize No
with transport PCR
packet available?
Yes
Grab PCR
stamp
No
PCR < Wait
STC lapse
Yes
Send packet
The first difference is that the embedded clock, PCR (Program Clock Reference)
is not included in all the transport packets. More than that, inside a transport stream
various programs can travel with their associated PCRs.
The option taken to solve this issue was to lock the first PCR stream found in the
transport. After that point, we track only and only that PCR stream. For that we have to
look at every transport packet, and find out if the adaptation field contains the PCR. From
that point, the mechanism is as we saw with program streams.
55
An MPEG archival and live streaming system over RTP/UDP networks
One of the attractive sides of MPEG over RTP is that a good defined way of
MPEG over it is set in the RFC 2250 (see annex).
Due to time constrains, it was not possible to implement the full specification of
RFC 2250, and some shortcuts were taken to get to a working final product with a limited
set of features.
We now specify what were the exact steps to packetize MPEG over RTP.
56
An MPEG archival and live streaming system over RTP/UDP networks
In all these cases the streams was inserted inside the RTP encapsulation as a
packetized stream of bytes. Although correct for program and system streams, audio and
video streams require much more work as reflected in RFC2250. Please refer to the
document, to know the details of audio and video encapsulation.
In particular, RTP packets with a maximum payload of 1460 bytes were chosen.
The incoming buffers from the system werre sliced in packets of 1460, and the remaining
bytes were sent as a smaller RTP packet. Although not an optimal solution, some features
like sequence numbering in the RTP packets were used.
The timestamping capability inside the RTP header was not used.
57
An MPEG archival and live streaming system over RTP/UDP networks
Transport Streams
One of the selling points of the system was supposed to be the Transport Stream
support over RTP. Special care was taken to be more compliant with RTP standards in
this sense.
The option taken was to packetize exactly seven transport packets in each RTP
packet. Because every transport packet weights 188 bytes, that sums up to a total of 1316
packets.
Also, in this case only sequence numbers were used in the RTP header.
58
An MPEG archival and live streaming system over RTP/UDP networks
Notably less complex, the work of the received merely consisted on taking the
payload of the RTP packets and deliver them to whatever was the system that the receiver
was hooked to, either a real time hardware decoder, our archival system or a network
transceiver.
However, one important feature was implemented to really make sense of the use
of RTP. Because we encoded the sequence numbers in the transmitting side (let’s recall
that the sequence number increases by one with each packet sent), we can detect at the
receiving side if some packets are lost or if their order has been reversed. Because we are
using UDP under the RTP layer, it can be really useful to implement this feature.
A simple window algorithm was chosen to reorder the incoming RTP packets. An
interesting point to note here is that the bigger our reordering window is, the more latency
we will add to the system. For this reason, small reordering windows were chosen, 3 or 4
packets.
Buffer N packets
Reorder
Deliver oldest
packet
59
An MPEG archival and live streaming system over RTP/UDP networks
6. In the works
Although almost all the main goals of the project were achieved, some features
could have been improved, and some work was left open for continuation. Let’s discuss
what are those parts individually in each section of this paper
MPEG Archival
Almost all of the goals were achieved looking one important was left to do.
In particular MPEG restamping. One of the requirements of the project was the
ability for the system to generate mux streams where the embedded clock (SCR and PCR)
increased monolithically with time. This behavior would make MPEG file player see the
file as continous in time and no seeking problems would occur.
To get to that point, one has to modify all the timestamps in the stream, SCR,
PCR, ESCR and DTS and PTS in a two step process. First a detection of the disruption
must occur, and then a compensation time must be calculated and applied to the stream in
real time.
Although some attempts were made in the area, it wasn’t possible to get a solid
and consistent behavior in terms of performance. This feature is then, work on progress.
One interesting feature was brought up after the work started. The need to pause
the file streaming indefinitely and the ability to resume it from the stop point. The feature
would require some changes in the timing part of the system, and it seems not too
problematic to implement.
As stated previously in the paper, most of the specifications inside the RFC2250
document were left out of the implementation.
Particularly important, the packaging of audio and video elementary streams a per
the RTP spec. The spec makes sure that small decodable features of the MPEG video
stream are individually packetized in RTP packets (we’re talking about MPEG video
slices). The first and most important advantage of doing this is that if an RTP packet is
lost, still the rest of the picture can be decoded without causing big artifacting in the
decoding process. Again, this feature was left out so It would be extremely interesting to
have it in.
60
An MPEG archival and live streaming system over RTP/UDP networks
Some other features like using the timestamping inside the headers for program
and system streams would have also been interesting, although this latest feature is not as
important as the one about audio/video packetizing.
61
An MPEG archival and live streaming system over RTP/UDP networks
7. Sources
ISO/IEC IS 13818, ITU-T Recommendation H.262, “Information technology -
Generic coding of moving pictures and Associated Audio information”
ISO/IEC IS 11172, “Information technology - Generic coding of moving pictures
and Associated Audio information for digital storage media”
H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, "RTP: A Transport Protocol
for Real-Time Applications", RFC 1889, January 1996.
G. Fernando, V.Goyal, M.Civanlar, "RTP Payload Format for MPEG1/MPEG2
Video ", RFC 2250, January 1998.
62
An MPEG archival and live streaming system over RTP/UDP networks
8. Annex
63
An MPEG archival and live streaming system over RTP/UDP networks
RFC 2250
64
An MPEG archival and live streaming system over RTP/UDP networks
Copyright Notice
Abstract
This memo describes a packetization scheme for MPEG video and audio
streams. The scheme proposed can be used to transport such a video
or audio flow over the transport protocols supported by RTP. Two
approaches are described. The first is designed to support maximum
interoperability with MPEG System environments. The second is
designed to provide maximum compatibility with other RTP-encapsulated
media streams and future conference control work of the IETF.
65
An MPEG archival and live streaming system over RTP/UDP networks
1. Introduction
66
An MPEG archival and live streaming system over RTP/UDP networks
Much interest in the MPEG community is in the use of one of the MPEG
System encodings, and hence, in Section 2 we propose encapsulations
of MPEG1 System streams and MPEG2 Transport and Program Streams with
RTP. This profile supports the full semantics of MPEG System and
offers basic interoperability among all four end-system types.
Each RTP packet will contain a timestamp derived from the sender's
90KHz clock reference. This clock is synchronized to the system
stream Program Clock Reference (PCR) or System Clock Reference (SCR)
and represents the target transmission time of the first byte of the
67
An MPEG archival and live streaming system over RTP/UDP networks
packet payload. The RTP timestamp will not be passed to the MPEG
decoder. This use of the timestamp is somewhat different than
normally is the case in RTP, in that it is not considered to be the
media display or presentation timestamp. The primary purposes of the
RTP timestamp will be to estimate and reduce any network-induced
jitter and to synchronize relative time drift between the transmitter
and receiver.
For MPEG2 Transport Streams the RTP payload will contain an integral
number of MPEG transport packets. To avoid end system
inefficiencies, data from multiple small MTS packets (normally fixed
in size at 188 bytes) are aggregated into a single RTP packet. The
number of transport packets contained is computed by dividing RTP
payload length by the length of an MTS packet (188).
For MPEG2 Program streams and MPEG1 system streams there are no
packetization restrictions; these streams are treated as a packetized
stream of bytes.
68
An MPEG archival and live streaming system over RTP/UDP networks
69
An MPEG archival and live streaming system over RTP/UDP networks
MPEG1 Audio can be distinguished from MPEG2 Audio from the MPEG
ancillary_data() header. For either MPEG1 or MPEG2 Audio, distinct
Presentation Time Stamps may be present for frames which correspond
to either 384 samples for Layer-I, or 1152 samples for Layer-II or
Layer-III. The actual number of bytes required to represent this
number of samples will vary depending on the encoder parameters.
70
An MPEG archival and live streaming system over RTP/UDP networks
This header shall be attached to each RTP packet after the RTP fixed
header.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MBZ |T| TR | |N|S|B|E| P | | BFC | | FFC |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
AN FBV FFV
71
An MPEG archival and live streaming system over RTP/UDP networks
FBV: full_pel_backward_vector
BFC: backward_f_code
FFV: full_pel_forward_vector
FFC: forward_f_code
Obtained from the most recent picture header, and are
constant for each RTP packet of a given picture. For I frames
none of these values are present in the picture header and
72
An MPEG archival and live streaming system over RTP/UDP networks
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|X|E|f_[0,0]|f_[0,1]|f_[1,0]|f_[1,1]| DC| PS|T|P|C|Q|V|A|R|H|G|D|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The extension start code (32 bits) and the extension start
code ID (4 bits) are included. Therefore the extensions are
self identifying.
73
An MPEG archival and live streaming system over RTP/UDP networks
T: top_field_first (1 bit)
P: frame_predicted_frame_dct (1 bit)
C: concealment_motion_vectors (1 bit)
Q: q_scale type (1 bit)
V: intra_vlc_format (1 bit)
A: alternate scan (1 bit)
R: repeat_first_field (1 bit)
H: chroma_420_type (1 bit)
G: progressive frame (1 bit)
D: composite_display_flag (1 bit). If set to 1, next 32 bits
following this one contains 12 zeros followed by 20 bits
of composite display information.
These values are copied from the most recent picture coding
extension and are constant for each RTP packet of a given
picture. Their meanings are as explained in the MPEG-2 standard.
This header shall be attached to each RTP packet at the start of the
payload and after any RTP headers for an MPEG1/2 Audio payload type.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MBZ | Frag_offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Frag_offset: Byte offset into the audio frame for the data
in this packet.
4. Security Considerations
74
An MPEG archival and live streaming system over RTP/UDP networks
75
An MPEG archival and live streaming system over RTP/UDP networks
dep_pic_temp: - 0 1 2 3 4 5 6 7 8 9
In stream order: 2I 0B 1B 5P 3B 4B 8P 6B 7B GOP_H 2I 0B 1B ...
ref_pic_temp: 2 3 4 5 6 7 8 9 10 ^ 11
-------------------------- | ^
Match Drop |
Mismatch
in ref_pic_temp
76
An MPEG archival and live streaming system over RTP/UDP networks
3. expected loss rates are low enough that missed frames are not a
concern, or
If T=1 and E=0, there may be extensions present in the original video
bitstream that are not included in the current packet. The
transmitter may choose not to include extensions in a packet when
they are not necessary for decoding or if one of the cases listed
above for not including the MPEG-2 video specific header extension in
a packet applies only to the extension data.
If N=0, then the Picture Header from a previous picture of the same
type (I,P or B) may be used so long as at least one packet has been
received for every intervening picture of the same type and that the
N bit was 0 for each of those pictures. This may involve:
77
An MPEG archival and live streaming system over RTP/UDP networks
Any time an RTP packet is lost (as indicated by a gap in the RTP
sequence number), the receiver may discard all packets until the
Beginning-of-slice bit is set. At this point, sufficient state
information is contained in the stream to allow processing by an MPEG
decoder starting at the next slice boundary (possibly after
reconstruction of the GOP_header and/or Picture_Header as described
above).
References
[4] Schulzrinne, H., "RTP Profile for Audio and Video Conferences
with Minimal Control", RFC 1890, January 1996.
Authors' Addresses
Gerard Fernando
Sun Microsystems, Inc.
Mail-stop UMPK14-305
2550 Garcia Avenue
Mountain View, California 94043-1100
USA
Phone: +1 415-786-6373
EMail: gerard.fernando@eng.sun.com
78
An MPEG archival and live streaming system over RTP/UDP networks
Vivek Goyal
Precept Software, Inc.
1072 Arastradero Rd,
Palo Alto, CA 94304
USA
Phone: +1 415-845-5200
EMail: goyal@precept.com
Don Hoffman
Sun Microsystems, Inc.
Mail-stop UMPK14-305
2550 Garcia Avenue
Mountain View, California 94043-1100
USA
Phone: +1 503-297-1580
EMail: don.hoffman@eng.sun.com
M. Reha Civanlar
AT&T Labs - Research
100 Schutlz Drive, 3-213
Red Bank, NJ 07701-7033
USA
Phone: +1 732-345-3305
EMail: civanlar@research.att.com
79
An MPEG archival and live streaming system over RTP/UDP networks
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
80
An MPEG archival and live streaming system over RTP/UDP networks
81