Advanced Multimedia Technology
Chapter 1: Multimedia Network
RTP & RTCP
QoS for multimedia network
Chapter 2: Voice and Video Over IP
Chapter 3: MPEG-4 & H264
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 1
RTP & RTCP
Nguyen Chan Hung – Hanoi University of
10/1/2007 Technology 2
RTP and related standards
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 3
Types of RTP Sessions
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 4
Consider sending 64 kbps RTP header indicates type
PCM-encoded voice over of audio encoding in each
Application collects the senders can change
encoded data in chunks, encoding during a
e.g., every 20 msec = 160 conference.
bytes in a chunk. (= 8000 RTP header also contains
bytes/sec/50) sequence numbers and
The audio chunk along with timestamps.
the RTP header form the
RTP packet, which is
encapsulated into a UDP
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 5
Uncompressed media data—audio or video— is captured into a buffer, from which compressed
frames are produced.
Frames may be encoded in several ways depending on the compression algorithm used (e.g.
Compressed frames are loaded into RTP packets for sending.
If frames are large, they may be fragmented into several RTP packets;
if frames are small, several frames may be bundled into a single RTP packet.
A channel coder may be used to generate error correction packets or to reorder packets before
After sending the RTP packets, the buffered media of those packets is freed.
The sender must buffer data for some time after the corresponding packets have been sent,
depending on the codec and error correction scheme used.
The sender is responsible for generating periodic status reports for the media streams it is
generating, e.g. lip synchronization.
It also receives reception quality feedback from other participants and may use that
information to adapt its transmission.
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 6
RTP Implementations (2)
Receiver is responsible for:
Collecting RTP packets from the network,
Correcting any losses,
Recovering the timing,
Decompressing the media,
Presenting the result to the user.
Sends reception quality feedback, allowing the sender to
adapt the transmission to the receiver,
Maintains a database of participants in the session.
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 7
A mixer has a unique view of the session: It sees all sources as synchronization
sources, whereas the other participants see some synchronization sources
and some contributing sources.
In above figure, participant X receives data from three synchronization sources—
Y, Z, and M—with A and B contributing sources in the mixed packets coming from
Participant A sees B and M as synchronization sources with X, Y, and Z
contributing to M.
The mixer generates RTCP sender and receiver reports separately for each
half of the session, and it does not forward them between the two halves.
It forwards RTCP source description and BYE packets so that all participants
can be identified
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 8
RTP packet format
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 9
• For an RTP session there is typically a
single multicast address; all RTP
and RTCP packets belonging to the
session use the multicast address.
• RTP and RTCP packets are
distinguished from each other through
the use of distinct port numbers.
• To limit traffic, each participant reduces
his RTCP traffic as the number
of conference participants increases.
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 10
RTCP packet format
Five types of RTCP packets are defined in the RTP specification:
receiver report (RR),
sender report (SR),
source description (SDES),
membership management (BYE),
and application-defined (APP).
They all follow a common structure: (see figure)
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 11
The reception quality
feedback in RR packets is
useful not only for the
sender, but also for other
participants and third-party
The RR feedback allow the
sender to adapt its
transmissions according to
Other participants can
determine whether problems
are local or common to
Network managers may use
monitors that receive only the
RTCP packets to evaluate
the performance of their
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 12
From the SR, an application
can calculate the average
payload data rate and the
average packet rate over
an interval without
receiving the data.
The ratio of the two is
the average payload size.
If it can be assumed that
packet loss is independent
of packet size, then:
Receiver Throughput =
number of packets *
average payload size
The timestamps are used
to generate a
media clocks and the NTP
Used for lip-synch
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 13
SDES Source DEScription (SDES) provides
participant identification and
supplementary details, such as
location, e-mail address, and
The information in SDES packets is
typically entered by the user and is
often displayed in the graphical user
interface of an application
Each list of SDES items starts with
the SSRC of the source being
described, followed by one or more
entries with the format shown in
Each entry starts with a type and a
length field, then the item text itself in
The length field indicates how
many octets of text are present; the
text is not null-terminated.
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 14
The RC field in the common
RTCP header indicates the
number of SSRC identifiers in
On receiving a BYE packet, an
implementation should assume
that the listed sources have left
the session and ignore any
further RTP and RTCP packets
from that source.
A BYE packet may also contain
text indicating the reason for
leaving a session, suitable for
display in the user interface.
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 15
RTCP APP: Application-Defined RTCP Packets
The application-defined packet
name is a four-character prefix
intended to uniquely identify this
extension, with each character
being chosen from the ASCII
Application-defined packets are
used for nonstandard extensions
to RTCP, and for experimentation
with new features.
Experimenters use APP to try new
features, and then register new
packet types if the features have
Several applications generate APP
should be prepared to ignore
unrecognized APP packets.
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 16
Audio Capture, Digitization, and Framing
Audio capture devices can produce
samples with 8-, 16-, or 24-bit
Linear, µ-law or A-law quantization,
Rates between 8,000 and 96,000
samples per second, mono or
It may be necessary to convert the
media to an alternative format
before the media can be used
for example, changing the sample
rate or converting from linear to µ-law
Many speech codecs perform voice
activity detection with silence
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 17
Most Video codec use inter-frame
compression introduce delay
YUV to RGB conversion
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 18
Use of Prerecorded Content
RTP makes no distinction
between live and prerecorded
media, and senders generate data
packets from compressed frames
in the same way
First, the sender must generate a
new SSRC and choose random
initial values for the RTP
timestamp and sequence number.
During the streaming process, the
sender must be prepared to
handle SSRC collisions and
should generate and respond to
RTCP packets for the stream.
Also, if the sender implements a
control protocol, such as RTSP,
that allows the receiver to pause
or seek within the media
stream, the sender must keep
track of such interactions so
that it can insert the correct
sequence number and timestamp
into RTP data packets
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 19
Fragmentation of a Media Frame into RTP Packets
The fragmentation process is critical to the quality of the media in the presence of packet loss.
The ability to decode each fragment independently is desirable
otherwise loss of a single fragment will result in the entire frame being discarded
When multiple RTP packets are generated for each frame, the sender must choose between
sending the packets in a single burst and spreading their transmission across the framing
Sending the packets in a single burst reduces the end-to-end delay but may overwhelm the limited
buffering capacity of the network or receiving host.
it is recommended that the sender spread the packets out in time across the framing interval.
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 20
Packet Reception – Input queues
Separation between the packet
reception and playout routines by input
queues (See figure)
It is important to store the exact arrival
time, M, of RTP data packets to
calculate interarrival jitter
The arrival time should be measured
according to a local reference wall
clock, T, converted to the media clock
Since the receiver do not have such a
clock, so usually we calculate the
arrival time by sampling the reference
clock (typically the system wall clock
time) and converting it to the local
where the offset is used to map
from the reference clock to the media
timeline, in the process correcting for
skew between the media clock and
the reference clock.
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 21
Disruption of Interpacket Timing during
There are bursts when
several packets arrive at
Gaps when no packets
Packets may even arrive
out of order.
The receiver does not
know when data packets
are going to arrive, so it
should be prepared to
accept packets in
bursts, and in any
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 22
The Playout Buffer
Data packets are extracted from their input queue and inserted into a source-
specific playout buffer sorted by their RTP timestamps.
Frames are held in the playout buffer for a period of time to smooth timing
variations caused by the network.
Holding the data in a playout buffer also allows the pieces of fragmented frames
to be received and grouped, and it allows any error correction data to arrive.
The frames are then decompressed, any remaining errors are concealed, and the
media is rendered for the user.
A single buffer may be used to compensate for network timing variability and as a
decode buffer for the media codec.
It is also possible to separate these functions: using separate buffers for jitter removal
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 23
The Playout Buffer Data Structures
The playout buffer comprises a
time-ordered linked list of
Each node represents a frame
of media data, with associated
The data structure for each
node contains pointers to:
the adjacent nodes,
the arrival time,
desired playout time for the
and pointers to both
The compressed fragments of
the frame (the data received in
The uncompressed media data
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 24
Calculation of clock skew:
observe the rate of the sender clock—the RTP
timestamp—and compare with the local clock.
If TR(n) is the RTP timestamp of the n th packet received,
and TL(n) is the value of the local clock at that time, then
the clock skew can be estimated as follows:
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 25
The Playout calculation
1. The sender timeline is mapped to the local playout timeline, compensating for
the relative offset between sender and receiver clocks, to derive a base time
for the playout calculation
2. If necessary, the receiver compensates for clock skew relative to the sender,
by adding a skew compensation offset that is periodically adjusted to the
3. The playout delay on the local timeline is calculated according to a sender-
related component of the playout delay and a jitter-related component
4. The playout delay is adjusted
if the route has changed ,
if packets have been reordered,
if the chosen playout delay causes frames to overlap,
in response to other changes in the media
5. Finally, the playout delay is added to the base time to derive the actual playout
time for the frame.
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 26
Key points of Chapter 1
RTP & RTCP Scheduling
Media transmission WFQ
and reception over Policing:
network Packet Classifications
Translator & Mixer DSCP/TOS
RTP Session T-Spec/R-Spec
RTP Stream Int-Serv
RTP Packet format Diff-Serv
SSRC & CSRC AF/EF
RTCP packet format DSCP
Int-Serv vs. Diff-Serv
10/1/2007 Nguyen Chan Hung – Hanoi University of Technology 27